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^0 I Abstract 

CO ' This paper considers large-scale simulations of wave propagation phenomena. We argue that 

it is possible to accurately compute a wavefield by decomposing it onto a largely incomplete set 
of eigenf unctions of the Helmholtz operator, chosen at random, and that this provides a natural 
way of parallelizing wave simulations for memory-intensive applications. 

Where a standard eigenfunction expansion in general fails to be accurate if a single term 
. is missing, a sparsity-promoting £i minimization problem can vastly enhance the quality of 

' synthesis of a wavefield from low-dimensional spectral information. This phenomenon may be 

seen as " compressive sampling in the Helmholtz domain" , and has recently been observed to 
have a bearing on the performance of data extrapolation techniques in seismic imaging [T. Lin 
and F. Herrmann, Geophysics, 2007]. 

This paper shows that ^i-Hclmholtz recovery also makes sense for wave computation, and 
identifies a regime in which it is provably effective: the one-dimensional wave equation with 
coefficients of small bounded variation. Under suitable assumptions we show that the number 
, of eigenfunctions needed to evolve a sparse wavefield defined on N points, accurately with very 

' high probability, is bounded by 

q{' C(?7) -logiV-loglogiV, 

, where C{ri) is related to the desired accuracy rj and can be made to grow at a much slower 

rate than A'' when the solution is sparse. The PDE estimates that underlie this result are new 
to the authors' knowledge and may be of independent mathematical interest; they include an 
estimate for the wave equation, an L°° — estimate of extension of eigenfunctions, and a 
, bound for eigenvalue gaps in Sturm-Liouville problems. 

^ ' In practice, the compressive strategy makes sense because the computation of eigenfunctions 

can be assigned to different nodes of a cluster in an embarrassingly parallel way. Numerical 
examples are presented in one spatial dimension and show that as few as 10 percents of all 
eigenfunctions can suffice for accurate results. Availability of a good preconditioner for the 
Helmholtz equation is important and also discussed in the paper. Finally, we argue that the 
compressive viewpoint suggests a competitive parallel algorithm for an adjoint-state inversion 
method in reflection seismology. 
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1 Introduction 



In this paper we consider a simple model for acoustic waves, 

a\x)^ix,t)-p^ix,t) = 0, xe[0,l], (1) 



du 

u{x,0) = uo{x), —{x,0)=ui{x), 

with Dirichlet {u{0,t) = u{l,t) = 0) or Neumann {u'{0,t) = u'{l,t) = 0) boundary conditions. The 
parameter cr{x) is viewed as the acoustic impedance of the medium, we will assume at least that 
< (Tmin ^ cr{x) ^ o"max) almost everywhere. In the sequel we will see how the classical equation 
of a vibrating string, with parameters p{x) and n{x), can be transformed into (1) in such a way 
that all the results of this paper hold unaffected in the more general case. 

Most numerical methods for solving (1) fall into two categories: either simulation in the time 
domain, by timestepping from initial and boundary data; or in the frequency domain using an 
expansion in terms of time-harmonic solutions. For the latter approach, when u{x,t) = Voj{x)e^'^*, 
then Vu, solves the Helmholtz equation 

v'^ix) + uj^a\x)vUx) =0, xe [0, 1], (2) 

with Dirichlet or Neumann boundary conditions. The special values of cu for which (2) has a 
solution correspond to eigenvalues A of the operator C = a~'^cf/dx'^ with the same boundary 
boundary conditions, through A = — o;^. The natural inner product that makes the operator jC 
self-adjoint is the one of the weighted space L^2([0, 



{f,g)= f f{x)g{x)a\x)dx. (3) 

(The regular Lp' inner product will not be used in the sequel.) Self-adjointness of C classically 
implies orthogonality of the eigenfunctions, in the sense that {v^J^,v^^^) = 5^^^^^J^. The operator C is 
also negative definite (Dirichlet) or negative semi-definite (Neumann) with respect to the weighted 
inner product, which justifies the choice of sign for the eigenvalues. As a result, any solution u{x, t) 
of (1) can be expanded as 

u{x,t) = ^c^{t)v^{x), (4) 

u> 

where 

sin ujt 

Coj{t) = {Uo,Vu,} COSUt + {Ui,Vu,) . 

If we are ready to make assumptions on u{x,t), however, equation (4) may not be the only 
way of synthesizing u{x, i) from its coefficients c^{t) = (n(-, t), i)^). The situation of interest in this 
paper is when u{x, t) is approximately sparse for each time t, i.e, peaks at a few places in x, and 
the impedance (t{x) has minimal regularity properties. When these conditions are met, we shall 
see that the information of the solution u{x, t) is evenly spread, and contained several times over 
in the full collection of coefficients Ci_j{t). In other words, there is a form of uncertainty principle 
to guarantee that a solution that peaks in x cannot peak in lo, and that it is possible to decimate 
Ci^{t) in such a way that a good approximation of u{x,t) can still be recovered. In addition, there 
exists a nonlinear procedure for this recovery task, which is so simple that it may have nontrivial 
implications for the numerical analysis of wave equations. 
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1.1 The Compressive Strategy 

How to recover a sparse, sampled function f{j/N) from a set of discrete orthobasis coefficients 
Cfc, which is incomplete but nevertheless "contains the information" of f{j/N), has been the concern 
of a vast body of recent work on compressed sensing [13, 28]. In this approach, the proposed solution 
for recovering f{j/N) is to find a vector with minimum li norm among all vectors which have the 
observed as coefficients. Compressed sensing currently finds most of its applications in signal or 
image processing, but its philosophy turns out to be relevant for computational wave propagation 
as well, where the basis vectors are discretized eigenfunctions of the Helmholtz equation. 

Accordingly, we formulate an i\ problem for recovering u{x,t) at fixed time t from a restricted 
set of coefficients C(^(t). To guarantee that these coefficients be informative about u{x,t), we will 
sample u at random. For practical reasons explained later, we use the following sampling scheme: 
1) draw a number w uniformly at random between and some maximum value Wmax, 2) find the 
closest a;[fe] to w, such that A[fe] = ~^[k] eigenvalue, 3) add this eigenvalue to a list provided 

it does not already belong to it, and 4) repeat until the size of the list reaches a preset value K. 
Eventually, denote this list as = {^[k] ■ k = 1, . . . , K}. 

Putting aside questions of discretization for the time being, the main steps of compressive wave 
computation are the following. Fix t > 0. 



1. 


Form the randomized set Qk and compute the eigenvectors Vi_j{x) for co G ^^k] 




2. 


Obtain the coefficients of the initial wavefields as {uo,Vi_j) and (tti,Ua,), for lo G Qk', 




3. 


Form the coefficients of the solution at time t as 

sin cot 

Cu>{t) = {uo,v^} cosut + {ui,Vu,) , CO G Uk; 




4. 


Solve the minimization problem 






min / a{x)\u{x)\dx, such that (u, v^;) = Ct^,(t), oj^Q^k- 
« 7o 


(5) 



The algorithmic specifics of points 1 and 4 will be discussed at length in the sequel. The 
main result of this paper concerns the number K = \^k\ of coefficients needed to ensure that 
the minimizer u{x) just introduced approximates u{x,t) within a controlled error, and controlled 
probability of failure. 

1.2 Main Result 

Specific notions of sparsity of the solution, and smoothness of the medium, will be used in the 
formulation of our main result. 

• Sparsity of the solution. Assume that u[j](t) is the solution of a discretization of the problem 
(1) on N equispaced spatial points, i.e., u[j]{t) ~ u{j/N, t). A central quantity in the analysis 
is the "size of the essential support" Sj] of this discretized solution, i.e., in a strategy that 
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approximates u[j]{t) by its S largest entries in modulus, how large would S need to be so 
that the error made is less than a threshold 77 in a weighted ii sense. Then this particular 
choice of S is called 5^. More precisely, for fixed t > consider a tolerance r? > 0, and the 
largest level 7 at which 

E ^bii^biwi (6) 

where a\j] should be thought of as a{j/N), or possibly a more sophisticated variant involving 
cell averages. Then is the number of terms in this sum, i.e., the number of grid points 
indexed by j such that ^ 7. In all cases Sj) ^ N, but the sparser u{x,t) the smaller 

• Smoothness of the medium. We consider acoustic impedances a{x) with small total variation 
Var log(cr). In particular, we will require Var(logcj) < vr. See Section 3 for a discussion of 
total variation. Note that a{x) is permitted to have discontinuities in this model. 

Three sources of error arise in the analysis: 1) the discretization error 

T=\\u\j]{t)-<j/N,t)\\e„ 

where the £2 norm is over j; 2) the truncation error tj just introduced, due to the lack of exact 
sparsity; and 3) a numerical error e made in computing the discrete eigenvalues and eigenvectors, 
which shows as a discrepancy 



for fixed t, and where the £2 norm is over uj € In addition to the absolute accuracy requirement 
that e be small, we also need the following minor relative accuracy requirements for the eigenvalues 
and eigenvectors. Two quantities A, B are said to be within a factor 2 of each other if A/2 ^ B ^ 
2A. 

Definition 1. (Faithful discretization) A spatial discretization of the Helmholtz equation (2) is 
called faithful if the following two properties are satisfied. 

• The gap \Cbn+i — between two consecutive eigenvalues of the discrete equation is within a 
factor 2 of the gap \uJn+i — between the corresponding exact eigenvalues; 

• There exists a normalization of the computed eigenvectors vcj \j\ and exact eigenvectors (x) 

such that the discrete norm (^jj <T[j]^{;(i [j]^^ is within a factor 2of\\vu,\\L2^, and simul- 
taneously maxj is within a factor 2 of \\aVu,\\L'^. 

Finally, we assume exact arithmetic throughout. The following theorem is our main result. 

Theorem 1. Assume that Var(loga") < n, and that the discretization of (2) is faithful in the 
sense of Definition 1. Assume that K eigenvectors are drawn at random according to the procedure 
outlined above. There exists C{u) such that if K obeys 

K ^ C{a) ■ Sr, log N ■ \og\Sr,) \og{Sr, log iV), (7) 

(with N sufficiently large so that all the logarithms are greater than 1), then with very high proba- 
bility the solution u[j]{t) of a discrete version of the minimization problem (5) obeys 

\\u{j/N,t)-u\j\{t)U ^ -^ry + C2£ + r. (8) 
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Furthermore, C(a) can he taken to obey 



^ 7r+yar(loga) _ (2Far(loga)), (9) 
TT — Variiog a) 

where C3 is a numerical constant. "Very high probability" here means 1 — 0(iV~") where a is an 
affine, increasing function of C3 . 

The discrete CWC algorithm is described in Section 2.2. The proof of Theorem 1 is in Sections 
2.3 and 2.4. 

Contrast this behavior of the £1 problem by considering instead an £2 regularization; by Plancherel 
the solution tx" would then simply be the truncated sum 

u^{x,t) = ^ c^{t)v^{x), 

with error 

M.,t)-J{.,t)\\h = J2 

This shows that may be very different from u as soon as Qk is not the complete set. In fact, the 
uncertainty principles discussed in this paper would show that the resulting error for sparse u{x, t) 
and random is large with very high probability. 

The estimate (8) is a statement of robustness of the ii recovery method, and is intimately 
related to a well-established stability result in compressed sensing [12]. If errors are made in the 
discretization, in particular in computing eigenvectors on which the scheme is based (e 7^ 0), then 
equation (8) shows that this error will carry over to the result without being overly amplified. 
This robustness is particularly important in our context since in practice a compressive numerical 
scheme is bound to be the "driver" of a legacy code for the Helmholtz equation. 

Theorem 1 also states that, if the discrete solution happens to be compactly supported on a 
small set {S^^ « N for 77 = 0), and no error is made in solving for the Helmholtz equation, then the 
compressive scheme would recover the discrete solution exactly, with high probability, and without 
using all the eigenvectors. It is not unreasonable to speak about compact support of the solution 
of a wave equation, since the speed of propagation is finite. 

Another important point is that there exists no particular, "optimized" choice of the eigenvectors 
that can essentially beat choosing them at random. The compressive strategy, whether in numerical 
analysis or signal processing, is intrinsically probabilistic. In fact, making a deterministic choice 
can possibly void the reliability of recovery as there would exist counter-examples to Theorem 1. 

Finally, the estimate is quite important from the viewpoint of computational complexity. In 
situations where 5^ is small, computing K ^ Sr, log N eigenvectors whose identity is unimportant 
as long as it is sufficiently randomized, can be much more advantageous than computing all N 
of them. This leads to an easily parallelizable, frequency-based algorithm for the wave equation 
requiring at most ~ SrjN'^ log N operations instead of the usual ~ for a QR method computing 
all eigenvectors. Complexity and parallelism questions are further discussed below. 



1.3 Why It Works 

The li regularization is chosen on the basis that it promotes sparsity while at the same time 
defining a convex optimization problem amenable to fast algorithms. It provides an exact relaxation 
of £0 problems when the vector to be recovered is sufficiently sparse, as was identified by David 
Donoho and co-workers in the mid-nineties in the scope of work on basis pursuit [18]. The same £1 
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minimization extracts information of a sparse vector remarkably well in the presence of incomplete 
data, provided the data correspond to inner products in a basis like Fourier, for which there exists 
a form of uncertainty principle. This observation was probably first treated mathematically in 1989 
in [29], and was refined using probabilistic techniques in work of Candes, Romberg, and Tao [13], 
as well as Donoho [28], both in 2004. This line of work received much attention and came to be 
known as compressed sensing or compressive sampling. 
Consider the ii minimization problem in M^, 

min||/||i s.t. f ■ (Pk = Ck, ke9.K, 

where ilx is a subset of 1, . . . , A^, and {f^k} is an orthobasis. The inner product f ■ (pk is called 
a "measurement" . The following three conditions are prerequisites for guaranteeing the success of 
recovery of ii minimization. 

1. The vector / to be recovered needs to be sparse; 

2. The basis vectors ipk are incoherent; and 

3. The actual fk used as measurement vectors are chosen uniformly at random among the ipk- 

Sparsity of a vector can mean small support size, but in all realistic situations it is measured 
from the decay of its entries sorted in decreasing order. Typically, a vector is sparse when it has a 
small £p quasi- norm for < p ^ 1, and the smaller p the stronger the measure of sparsity. 

Incoherence of an orthobasis {ipk} of M'^ means that 

sup l^klj]] ^ -j=, (10) 
j,k=i,...,N VN 

where the parameter fJ,^ 1, simply called incoherence, is a reasonably small constant. For instance, 
if (fk are the column of the isometric discrete Fourier transform, then /j, attains its lower bound 

of 1, and the Fourier vectors are said to be maximally incoherent. The generalization where an 
orthobasis is incoherent with respect to another basis is often considered in the literature, in which 
case incoherence means small inner products of basis vectors from the two bases. Hence in our case 
we also speak of incoherence with respect to translated Diracs. 

The condition of uniform random sampling of the measurement vectors is needed to avoid, 
with high probability, complications of an algebraic nature that may prevent injcctivity of the 
projection of sparse vectors onto the restricted set of measurements, or at least deteriorate the 
related conditioning. More quantitatively, randomness allows to promote the incoherence condition 
into a so-called restricted isometry property (RIP). We will have more to say about this in the sequel. 
Note that if the measurement basis is itself generated randomly, e.g. as the columns of a random 
matrix with i.i.d. gaussian entries, then the further randomization of the measurement vectors is 
of course not necessary. 

When these three conditions are met, the central question is the number K of measurements 
needed for the ii recovery to succeed. Recent papers [14, 49] show that the best answer known to 
date is 

K ^ C • /x^ • S log AT • log2(5) \og{S log N), 

where S is the number of "big" entries in the vector to be recovered, /x is the incoherence, and 
C is a decent constant. (The trailing log factors to the right of S'logA^ are conjectured to be 
unnecessary.) Stability estimates of the recovery accompany this result [12]. All this will be made 
precise in Section 2; let us only observe for now that (7) is manifestly a consequence of this body 
of theory. 
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The mathematical contribution of this paper is the verification that the conditions of 1) sparsity, 
2) incoherence, and 3) uniform sampling are satisfied in presence of Helmholtz measurements for 
solutions of the wave equation, and in the context of a practical algorithm. For all three of these 
requirements, we will see that a single condition of bounded variation on log a suffices in one spatial 
dimension. 

• Wc show in Section 3.1 that when Var(log a) < 1, the wave equation (1) obeys a Strichartz-like 

estimate 

rl / pi rl rx 

/ a{x)\u{x,t)\dx ^ D{a) ■ I / a{x)\uo{x)\dx + / (^"^{x)] / ui{y)dy\dx 
Jo \Jo Jo Jo 

for t ^ l/fTmin) and where 

^^^^ = ^V^l-VaJ(log^)' 

This result shows that, in an sense, the sparser the initial conditions the sparser the 

solution at time t. Hence choosing sparse initial conditions gives a control over the quality 
of ii recovery through the discrete quantity 5^ introduced above — although we don't have a 
precise estimate to quantify this latter point. 

• We show in Section 3.2 that when Var(loga") < oo, solutions of the Helmholtz equation must 
be extended, which translates into a property of incoherence at the discrete level. If solves 
(2), the extension estimate is 

^ \/2 exp (Var (log cr)) • Hv^Hl^ • 



A quadrature of the integral on the right-hand side would reveal a factor l/viV, hence a com- 
parison with the definition (10) of incoherence shows that the leading factor \/2 exp (Var (log cr)) 
is in fact — modulo discretization questions — an upper bound on the incoherence fi of eigen- 

functions with translated Diracs. 



We show in Section 3.3 that when Var(log cr) < vr, two eigenvalues of the Helmholtz equation 



(2) cannot be too close to each other: if Xj = — for j = 1,2 are two distinct eigenvalues. 



then 

TT — Var(log cr) , , 7r + Var(logcr 

^ \UJI -L02\ < 



[^a{x)dx f^^a{x)dx 

This gap estimate shows that if we draw numbers uniformly at random within [0,a;max]) and 
round them to the nearest uj^i-j for which A[fc] = —'^'^kj is an eigenvalue, then the probabilities 
Pn of selecting the 77,-th eigenvalue A„ are of comparable size uniformly in n — again, modulo 
discretization questions. This provides control over departure from the uniform distribution, 
and quantifies the modest penalty incurred in the bound on K as the ratio of probabilities 

minnPn vr - Var (log cr) 



Punif TT -I- Var (logo-) 
where Punif refers to the case of uniform cr and equispaccd uj^- 

Bounded variation can be thought of as the minimum smoothness requirement of an one- 
dimensional acoustic medium, for which wave propagation is somewhat coherent, and no local- 
ization occurs. For instance, the incoherence result physically says that the localization length is 
independent of the (temporal) frequency of the wave, for media of bounded variation. 

All the points discussed above will be properly integrated in the justification of Theorem 1, in 
Section 2. The proofs of the three PDE results are in Section 3. 
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1.4 How It Works 



A classical algorithm for computing all N eigenvectors of the discrete Helmholtz equation would 
be the QR method, with complexity an iterative 0{N^). Since not all eigenvectors are required 
however, and since the discretized operator C is an extremely sparse matrix, all-purpose linear 
algebra methods like QR acting on matrix elements onc-by-one arc at a big disadvantage. 

Instead, it is more natural to set up a variant of the power method with randomized shifts for 
computing the desired eigenvectors and corresponding eigenvalues. We have chosen the restarted 
Arnoldi method coded in Matlab's eigs command. In our context, the power method would compute 
the inverse {C + w'^)^^ ioi a shift w chosen at random, and apply it repeatedly to a starting vector 
(with random i.i.d. entries, say), to see it converge to the eigenvector with eigenvalue closest to 
—w'^. The probability distribution to place on w should match the spectral density of C as closely 
as possible; one important piece of a priori information is the estimate (34) on eigenvalue gaps. 

Applying {jC+w'^)~^, or in practice solving the system {jC+w'^)u = f, can be done in complexity 
0{N'^) using an iterative method. However, the inversion needs not be very accurate at every step, 
and can be sped up using an adequate preconditioner. It is the subject of current research to bring 
the complexity of this step down to 0{N) with a constant independent of frequency w, see [34] for 
some of the latest developments. In this paper we use a particularly efficient preconditioner based 
on discrete symbol calculus [24] for pre-inverting C — w'^ (there is no typo in the sign), although 
the resulting overall complexity is still closer to 0{N'^) than to 0{N). 

The resolution of £i minimization can be done using standard convex optimization methods 
[8] such as interior point for linear programming in the noiseless setting, and second order cone 
programming in the noisy case [18]. One can however exploit the separability of the norm 
and design specific solvers such as exact [30] or approximate [32] path continuation and iterative 
thresholding [35, 22, 20, 54]. In this paper we used a simple iterative thresholding algorithm. 

The complexity count is as follows: 

• It takes 0{KN'^) operations to solve for K eigenvectors with a small constant, and as we 
have seen, K is less than N in situations of interest; 

• Forming spectral coefficients at time t = and their counterpart at time t is obviously a 
0{N) operation. 

• It is known that solving an ii problem with N degrees of freedom converges geometrically 
using the methods explained below [9] and is therefore a 0{N) operation. 

Compared to the QR method, or even to traditional timestepping methods for (1), the compres- 
sive scheme has a complexity that scales favorably with N. The control that we have over the size 
of K comes from the choice of sparse initial conditions, and also from the choice of time t at which 
the solution is desired. If the initial conditions are not sparse enough, they can be partitioned into 
adequately narrow bumps by linearity; and if t is too big, the interval [0,t] can be divided into 
subintervals over which the compressive strategy can be repeated. 

More details on the implementation can be found in Section 4. 

1.5 Significance 

The complexity count is favorable, hnt wc believe the most important feature of the com- 
pressive solver, however, is that the computation of the eigenvectors is "embarrassingly parallel"', 
i.e., parallelizes without communication between the nodes of a computer cluster. This is unlike 
both traditional timestepping and Helmholtz- via- QR methods. The £i solver — proximal iterative 
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thresholding — also parallelizes nicely and easily over the different eigenvectors, without setting up 
any domain decomposition method. We leave these "high-performance computing" aspects to a 
future communication. 

Another decisive advantage of the compressive strategy is the ability to freely access the solution 
at different times. In inverse problems involving the wave equation, where an adjoint-state equation 
is used to form the gradient of a misfit functional, one is not just interested in solving the wave 
equation at a given time t, but rather in forming combinations such as 



where u solves an initial-value problem, and q solves a final-value problem — the adjoint-state wave 
equation. A classical timestepping method requires to keep large chunks of the history in memory, 
because u{x, t) is formed by time stepping up from t = 0, and q{x, t) is formed by time stepping 
down from t = T. The resulting memory overhead in the scope of reverse-time migration in 
reflection seismology is well documented in [57]. The compressive scheme in principle alleviates 
these memory issues by minimizing the need for timestepping. 

Let us also comment on the relief provided by the possibility of solving for incomplete sets 
of eigenvectors in the scope of compressive computing. As we saw, using an iterative power-type 
method has big advantages over a QR method. Could an iterative method with randomized shifts 
be used to compute all eigenvectors? The answer is hardly so, because we would need to "collect" all 
eigenvectors from repeated random sampling. This is the problem of coupon collecting in computer 
science; the number of realizations to obtain all N coupons, or eigenvectors, with high probability, is 
0{N log N) with a rather large constant. If for instance we had already drawn all but one coupons, 
the expected number of draws for finding the last one is a 0{N) — about as much as for drawing the 
first N/2 ones! For this reason, even if the required number of eigenvectors is a significant fraction 
of A^, the compressive strategy would remain attractive. 

Finally, we are excited to see that considerations of compression in the eigenfunction domain 
requires new kinds of quantitative estimates concerning wave equations. We believe that the ques- 
tions of sparsity, incoherence, and eigenvalue gaps are completely open in two spatial dimensions 
and higher, for coefficients a{x) that present interesting features like discontinuities. 

1.6 Related Work 

As mentioned in the abstract, and to the best of our knowledge, ii minimization using Helmholtz 
eigenfunctions as measurement basis vectors was first investigated in the context of seismic imag- 
ing by Lin and Herrmann [41] in 2007. Data extrapolation in seismology is typically done by 
marching the so-called single square root equation in depth; Herrmann and Lin show that each 
of those steps can be formulated as an optimization problem with an £i sparsity objective in a 
curvelet basis [11], and constraints involving incomplete sets of eigenfunctions of the horizontal 
Helmholtz operator. The depth-stepping problem has features that make it simpler than full wave 
propagation -extrapolation operators with small steps arc pscudodifferential hence more prone to 
preserving sparsity than wave propagators — but [41] deals with many practical considerations that 
we have idealized away in this paper, such as the choice of basis, the higher dimensionality, and 
the handling of real-life data. 

Sparsity alone, without using eigenfunctions or otherwise incoherent measurements, is also the 
basis for fast algorithms for the wave equation, particularly when the same equation needs to be 
solved several times. Special bases such as curvelets and wave atoms have been shown to provide 
sparse representations of wave propagators [51, 10]. Speedups of a factor as large as 20, over both 
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spectral and finite difference methods, liave been reported for a two-dimensional numerical method 
based on those ideas [25]. See also [33] for earlier work in one spatial dimension. Computational 
harmonic analysis for the solution of PDE has its roots in the concept of representing singular 
integral operators and solving elliptic problems using wavelets [44, 6, 19]. 

From a pure complexity viewpoint, methods based solely on decomposing the equation in a fixed 
basis are not entirely satisfactory however, because of the heavy tails of supposedly nearly-diagonal 
matrices. The resulting constants in the complexity estimates are cursed by the dimensionality 
of phase-space, as documented in [25]. There is also the question of flexibility vis-a-vis special 
domains or boundary conditions. We anticipate that representing the equation in an incoherent 
domain, instead, followed by a sparse recovery like in this paper may not suffer from the same ills. 

Of course, sparsity and incoherence ideas have a long history in imaging problems unrelated to 
computation of PDE, including in geophysics [50]. As mentioned earlier, from the mathematical 
perspective this line of work has mostly emerged from the work of Donoho and collaborators [29, 18]. 
See [43] for a nice review. 

Finally, previous mathematical work related to the theorems in Section 3 are discussed at the 
end of the respective proofs. 



2 The Compressive Point of View: from Sampling to Computing 

In this section we expand on the reasoning of Section 1.3 to justify Theorem 1. We first introduce 
the quoted recent results of sparse signal recovery. 

2.1 Elements of Compressed Sensing 

Consider for now the generic problem of recovering a sparse vector /q of from noisy mea- 
surements y = $/o + -2 G M^, with K ^ N, and ||2:||2 ^ £■ The £i minimization problem of 
compressed sensing is 

min||/||4, s.t. \\^f-y\\2^e. (Pi) 

At this level of generality we call $ the measurement matrix and take its rows (pk to be orthonormal. 
Accurate recovery of (Pi) is only possible if the vectors are incoherent, i.e., if they are as different 
as possible from the basis in which /o is sparse, here Dirac deltas. Candes, Romberg and Tao make 
this precise by introducing the S-restricted isometry constant 63, which is the smallest < S < 1 
such thcit 

{l-S)\\c\\l^\\^Tc\\l^{l + 5)\\c\\l. (11) 

for all subsets T of {1, . . . , N} such that \T\ ^ S, for all vectors c supported on T, and where is 
the sub-matrix extracted from $ by selecting the columns in T. Equation (11) is called S -restricted 
isometry property. 

Candes, Romberg and Tao proved the following result in [12]. 

Theorem 2. For fo G M^, call fo,s the best approximation of /o with support size S. Let S be 
such that S2S + 3(545 < 2. Then the solution f of (Pi) obeys 

11/ - /0II2 ^ Ci £ + C2 (12) 

The constants Ci and C2 depend only on the value of 63s o,nd 84^3 ■ 
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Sparsity of /o is encoded in the rate of decay of ||/o — /o.sll^i as S" ^ oo. 

The problem of Unking back the restricted isometry property to incoherence of rows was perhaps 
first considered by Candes and Tao in [14] . In this paper we will use the following refinement due to 
Rudelson and Vershynin [49] . It relates the allowable value of S to the number K of measurements 
and the dimension N of the vector to be recovered. 

Theorem 3. Let A be a N-by-N orthogonal matrix, and denote fi = s/N maxjj |Ay|. Extract $ 
from A by selecting K rows uniformly at random. For every e there exists a constant > such 
that if 

K^C,-i?- SlogN- log2(S)log(51og7V)), (13) 
then # obeys the S -restricted isometry property (11) with 6s < s and with very high probability. 

Randomness plays a key role in this theorem; no instance of a corresponding deterministic result 
is currently known. Above, "very high probability" means tending to 1 as 0{N~"^), where m is an 
affine increasing function of C in (13). The proof of Theorem 3 in [49] uses arguments of geometric 
functional analysis and theory of probability in Banach spaces. Note that [49] prove the theorem 
for |U = 1, but it is straightforward to keep track of the scaling by /j, in their argument^. 

2.2 Discretization 

Equations (12) and (13) formally appear to be related to the claims made in Section 1.3, but 
we have yet to bridge the gap with the wave equation and its discretization. 

Let us set up a consistent finite element discretization of (1) and (2), although it is clear that 
this choice is unessential. The Helmholtz equation with boundary conditions can be approximated 
as 

Lijvc:,\j] + ^ Mijva,\j] = 0, 
j i 

where the stiffness matrix L (a discretization of the second derivative) and the mass matrix M 
properly include the boundary conditions. Square brackets indicate integer indices. Here u is the 
discrete counterpart to the exact cj, and has multiplicity one, like w, by the assumption of faithful 
discretization (Definition 1). The discrete eigenvector vc:j[j\ is an approximation of the eigenfunction 

samples v^{j/N). 

We assume for convenience that the mass matrix can be lumped, and that we can write Mij = 
a[i]'^5ij. If a is smooth, then (j[i] ~ (T(i/A^); and if a lacks the proper smoothness for mass lumping 
to be a reasonable operation, then all the results of this paper hold with minimal modifications. 
We also assume that Lij is symmetric negative definite; as a result, so is a[i]^^ Lija\j]~^ , and the 
usual conclusions of spectral theory hold: a; < 0, and <7[j]t;ti[j] are orthogonal for different lo, for 
the usual dot product in j. 

The corresponding semi-discrete wave equation (1) is 

= uoU/N), ^(0) = u,{j/N) 

^For the interested reader, [49] denotes fj, by K. The scahngs of some important quantities for the argument in 
[49], in their notations, are Cio ~ K, Ei ~ K, ki ~ K, and k ~ K^. 
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Because Lij is the same stiffness matrix as above, the solution is 



with 



Can 



^bJ(*) = 2^ ( cos{(jjt)co^a> H ^ — ci,i, 1 va,[j\. 



CO,. = Yl^\j?Vu,\j] uUm, ci,^ = J2^[j?vM ^(0). 



/ N \ sinfwt) 
C(i(tj = cos(a;i)co,d; H : — ci,^,. 

A discretization error is incurred at time t > 0: wc have already baptized it r = \\u[j]{t) — u{j /N, t)\\2 
in the introduction. It is not the purpose of this paper to relate r to the grid spacing. A nice 
reference for the construction of finite elements for the wave equation, in a setting far generalizing 
the smoothness assumptions made on a in this paper, is in [47]. 

Let us now focus on the discrete formulation. The ideal £i problem that we would like to solve 

is 

mmJ2 <^[j]Hj]{t)\, s.t. JZ^tjf ^blW^^iZ'b] = 
j j 

where Co G are chosen uniformly at random. 

In practice, we must however contend with the error in computing the discrete eigenvalues cD 
and eigenvectors Vi:j[j] by an iterative linear algebra method. This affects the value of ccj{t) as well 
as the measurement vectors in the equality constraints above. We model these errors by introducing 
the computed quantities Va;[j] and Ci,(t), and relaxing the problem to 

mmY^a[j]\u\j]it)\, s.t. || ^bf - c^(i)||2 ^ e, (14) 

j j 

for some adequate^, e, like mentioned throughout in Section 2.1. The £2 norm is here over a). 



2.3 Proof of Theorem 1 

Let us now explain why the machinery of Section 2.1 can be applied to guarantee recovery in 
the £1 problem (14). First, it is convenient to view the unknown vector to be recovered in (14) as 
cj[j]|u[j](t)|, and not simply u[j](t). This way, we are exactly in the situation of Theorem 2: the 
objective is a non-weighted £1 norm, and the measurement vectors A(;jj = CF[j]vcj[j] are orthogonal 
with respect to the usual dot product in j. 

Sparsity of the discrete solution (T[i]'u[i](t) is measured by the number of samples 5^ necessary 
to represent it up to accuracy rj in the £1 sense, as in equation (6). If wc let u^[j]{t) be the 
approximation of ii[j](t) where only the S largest entries in magnitude are kept, and the others put 
to zero, then Sj) is alternatively characterized as the smallest integer S such that 

J2^[j]\u\jm-u'\jm\^r,. 

This expression is meant to play the role of the term ||/o — /o,5||^i in equation (12). 

One last piece of the puzzle is still missing before we can apply Theorems 2 and 3: as explained 
earlier, the methods for drawing eigenvalues at random do not produce a uniform distribution. The 

^Here e is assumed to be known, but in practice it is permitted to over-estimate it. 



12 



following proposition quantifies the effect of nonuniformity of the random sampling on the number 
of measurements K in Theorem 3. Recall that our method for drawing eigenvectors qualifies as 
without replacement since an eigenvalue can only appear once in the set fix- 

Proposition 4. Let A be an N-by-N orthogonal matrix. Denote by K^^if ihe number of rows taken 
uniformly at random in order to satisfy the accuracy estimate (8) with some choice of the constants 
Ci, C2 . Now set up a sampling scheme, nonuniform and without replacement for the rows of A, 
as follows: 

• In the first step, draw one row from a distribution Pn, and call it Ui; 

• At step k, draw one row from the obviously rescaled distribution 

Pn 



and call it n^. Repeat over k. 



Denote by K the number of rows taken from this sampling scheme. In order to satisfy the estimate 

(8) with the same constants C\ and C2 as in the uniform case, it is sufficient that K compares to 

I^unif 

jy- ^ Punif 

^ . ^unif- 

T^Va.n=l,...,N Pn 

where Punij would he the counterpart of Pn in the uniform case, i.e., Pumf= 1/iV- 

Proof. See the Appendix. □ 

We can now apply the theorems of compressed sensing. Call the solution of (14). By 

Theorem 2, the reconstruction error is 

ii^bK*) - <j/N,t)h ^ \\u\jm - <j/NM2 + mm - y^m)h 

^ T + Ci-^+C2S. 

The number K of eigenvectors needed to obtain this level of accuracy with very high probability 
is given by a combination of Theorem 3 and Proposition 4: 

K^C- (^^■A-Sr,logN-log\Sr,)logiS^logN)), (15) 

and where the incoherence fx is given by 

II = Vn niax |o-[j>a,[j]| (16) 

This justifies (7) and (8) with the particular value 

C{a) = C-( ^HHiL . A . (17) 
\imn Pn } 

It remains therefore to justify the link between C{a) and the smoothness of a, given in equation 

(9) . This entails showing that 

1. [i has a bound independent of N (eigenvectors are incoherent); and 
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2. the ratio of probabihties can be bounded away from zero independently of A^. 

In addition, we would like to argue that 5^ can be much smaller than N, which expresses 
sparsity. All three questions arc answered through estimates about the wave equation, which are 
possibly new. Although the setup of the recovery algorithm is fully discrete, it is sufficient to 
focus on estimates the non-discretized wave equation; we explain below why this follows from the 
assumption of faithful discretization in Definition 1. 



2.4 Sparsity, Incoherence, and Randomness 

We address these points in order. 

• Sparsity. The quantity S^j introduced above depends on time t and measures the number 

of samples needed to represent the discrete solution n[j](t) to accuracy ij in (a weighted) ii 
space. It is hoped that by choosing sparse initial conditions, i.e., with small 5^ at time t = 0, 
and restricting the time T up to which the solution is computed, will remain small for all 
times ^T. If we expect to have such control over 5^, it is necessary to first show that 
the ii norm of the solution itself does not blow up in time, and can be majorized from the ii 
norm at time zero. In Section 3.1 we establish precisely this property, but for the continuous 
wave equation and in the norm. The only condition required on a for such an estimate to 
hold is Var(log(T) < 1. 



Incoherence. We wish to bound n = VN max [j] \ by a quantity independent of N when 

the vcj [j] are £2 normalized in a weighted norm, 

In Section 3.2, we show that provided Var(log(7) < 00, we have the continuous estimate 

||o"t'a)||L°° ^ a/2 exp(Var(loga)) • ||vt^||L2 . 

At the discrete level, approximating the integral / a'^{x)\vaj{x)\'^ dx by the sum X^jLi ^bl^bo; 
shows that the quantity independent of N is the incoherence (16). More precisely, the assump- 
tion of faithful discretization allows to relate continuous and discrete norms, and conclude 
that 

fi^C ■ VN\\av^\\Loo- 7. — 7. ^ C • V2 exp(Var(log(T)), 

where C is the numerical constant that accounts for the faithfulness of the discretization, 
here C = 8 for the arbitrary choice we made in Definition 1. 

Randomness. Finally, for the question of characterizing the probability distribution for picking 
eigenvectors, recall that it derives from the strategy of picking shifts w at random and then 

finding the eigenvalue A[j.] = such that cjj^] is closest to w. To make sense of a probability 

distribution over shifts, we consider eigenvalues for the continuous Helmholtz equation (2) in 
some large interval [— M^^,0], or equivalently, ^ k < N. 

If (T = 1, then LVn = nvr, with n ^ 1 if Dirichlet, and n ^ if Neumann. The corresponding 
eigenfunctions are of course cos(n7r) (Neumann) and sin(n7r) (Dirichlet). In this 
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uniform sampling of the shifts w would generate a uniform sampling of the frequencies tjj^.] . 
We regard the spectrum in the general case when a{x) / 1 as a perturbation of LOn = nn. So 
we still draw the shifts w uniformly at random, and derive the corresponding distribution on 
the Lo^k] from the spacing between the Un- Namely if we let Ao^mim ^^max be the minimum, 
respectively maximTim distance between two consecutive ujn in the interval n G [0, — 1] — 
each eigenvalue is known to have multiplicity one — then the probability of picking any given 
eigenvalue by this scheme obeys 

Pn ^ Punif "T ) 

where Punif = jf would be the corresponding probability in the uniform case. 
In Section 3.3, we prove that the spacing between any two consecutive Un obeys 

TT — Var(log TT + Var(log a) 

—J——— ^ \Un-U>n+l\ ^ .1 , . , ' 

Jo a{x) dx Jo a{x) dx 

provided Var(log a) < oo. By the assumption made in Definition 1, the computed eigenvalues 
satisfy a comparable bound, 

1 TT — Var (log a) , o ^ Var (log a) 

77 J ^ — 1 ^ 2 J • 

2 a{x) dx /o a{x) dx 
In the continuous case, the gap estimate implies 

Pn ^ Au;min ^ TT - Var (log O") 

Punif ^ AWmax ^ TT + Var (log Cj) ' 

while an additional factor 1/4 is incurred in this lower bound, in the discrete case. 

Note that the probabilities of selecting the endpoint eigenvalues and — w^ax would in princi- 
ple be halved by the advocated sampling procedure, because their interval is one-sided. This 
is a non-issue algorithmically since the endpoint eigenvalues are much more easily computed 
than the others, by a power method without inversion. By default, we can include those 
eigenvalues in Qk- Mathematically, adding measurements (eigenvectors) deterministically 
cannot hurt the overall performance as we saw in Proposition 4. 

The observations on incoherence and randomness can be combined with (17) to justify the form 
of C{a) in equation (9): 

^/ X ^ ^ f Punii 2\ ^ ry 7r + Var(log(T) 

C{a) ^ C ■ — ] — — • exp(2Var(log(T)). 

\m.m Pn J TT — Var(logcr) 

3 Sparsity, Incoherence, and Gap Estimates for the Wave Equa- 
tion 

Let us first recall the basic results concerning wave equations. We only need to require ctq ^ 
a(x) ^ o"! for a.e. x G [0,1], to obtain existence and uniqueness. In that context, when uq G 
H^{0,1) and ui G L^(0,1), the solution obeys u G Ct°([0, T], i7^(0, 1)) n C/([0, T], (0, 1)), as well 
as the corresponding estimate 

\\u{t,x)\\L^Hi ^ C{T) ■ (ll^xolli/1 + IKiIIlO , (18) 
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where the supremum in time is taken (here an in the sequel) over [0, T]. Another background result 
is the continuous dependence on the coefficients a{x) in L°° . Let Uk, k = 1,2 solve (20) with ak in 
place of a, over the time interval [0,T]. Then 

ll^^i - U2\\l°°l^ ^ C{T,uo,ui) ■ \\ai - c72||loo. (19) 

All these results are proved by standard energy estimates. See [42] and [55] for existence and 
uniqueness; and [3, 4] for continuity on the parameters. 

It seems that we have lost a bit of generality in considering the wave equation (1) with a single 
parameter a{x), instead of the usual equation of acoustics 

p{x) + V • {Kx)Vu) = 0, X G W^, (20) 

with the two parameters p{x) (density) and /x(x) (bulk modulus). Equation (1) however follows 
from (20) if we change the unique spatial variable z into 

x = [ fi-\z')dz', 
Jo 

and put cr{x) = p{x)ii{x) the local acoustic impedance. With bounded from above and be- 
low a.e., such a change of variables would only alter the constants in the results of sparsity and 
incoherence proved in this section. It would not alter the eigenvalue gap result of Section 3.3. 
Hence we can focus on (1) without loss of generality. For reference, the local speed of sound is 

V{x) = ^J|l{x)/p{x).^ 

In the sequel we further assume that logcr has hounded variation. The space SF([0, 1]) is 
introduced by defining the seminorm 

Var(/) = sup^ [/(xj-i) - f{xj)\, 

where the supremum is over all finite partitions of [0, 1] such that xj is a point of approximate 
continuity of /. Var is called the essential variation^ of /, and also equals the total variation 
|/'|([0, 1]) of /' as a signed measure on [0.1]. The norm of i3y([0, 1]) is then ||cr||Bi/ = ||cr||^i + 
Var(cT). If the total variation is taken over the interval [0,x] instead, we will denote it as YaTx{<^)- 
Wc will need the following result for BV functions in one dimension. 

Lemma 1. Let f G BV{[0,1]), and extend it outside of [0,1] by the constant values /(O^) and 
/(1~) respectively. For every e > 0, consider a mollifier Pe{x) = ^p (|), where p G C°°(M), p ^ 0, 
supp{p) C [—1, 1], and p{x) dx = 1. Consider fe{x) = pe{x — y)f{y) dy for x G M. Then 

1. For each e > 0, fs e C°°(R); 

2. lim£_^o Jo \f{x) - fs{x)\ dx = 0; 

^Notice that a liomcomorphism z x is the natural obstruction to the ID inverse problem of recovering the 
parameters p and yu from boundary measurements of u. As a result only the local impedance is recoverable up to 
homeomorphism from boundary measurements, not the local speed of sound. This observation, only valid in one 
spatial dimension, is discussed in great detail in [3]. 

*The variation of / would be defined using the same supremum, but over all partitions of [0, 1]. Requiring that Xj 
is a point of approximate continuity addresses the problem of inessential discontinuities of where f{x) is not in the 
interval defined by the left and right limits f{x~) and f{x'^). See [60] on p. 227 or [27] on p. 17 for a comprehensive 
discussion. 
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3. For each e > and x G M, minj-gp,!] fi^) ^ fei^) ^ max^gjo,i] fi^)i 

4- For each e > 0, Var(/e) = \fe{x) \ dx; 

5. lim,^oMA)= Var{f). 

Proof. All these facts are proved in [60]: points 1 and 2 on p. 22, point 3 by elementary majorations 
(see also p. 22), point 4 on p. 227, and point 5 on p. 225. □ 

3.1 Analysis of Speirsity 

In this section we prove the following L?- estimate. 

Theorem 5. Letue Cj°([0, T], if^ (0, l))nC/([0, T], (0, 1)) solve (20) with Dirichlet or Neumann 
boundary conditions, and let Ui{x) = Ui{y) dy. Assume logcr G BF([0, 1]), with 

Var(log(T) < 1. 

Let tf = l/fj 

mill; o- lower bound on the time it takes a fully transmitted bump to travel the length of 
the interval [0, 1]. For each t > 0, let n be the smallest integer such that t ^ nt"; then 

/ \ 3/2 

\\u{;t)\\Li ^ 2 . • (ll^olUi + \\aU,h^) , (21) 

with D = i_var(iogcr) ' If i'nstead (20) is posed with periodic boundary conditions, then it suffices 
that Varflog cr) < 2 and the same result holds with D = - — , , / r . 

^ ' 1 — iVar(log(T) 

This result calls for a few remarks, which are best expressed after the proof is complete. 

Proof. The proof is divided into five steps. 

First, we approximate a G BV by an M-term piecewisc constant function (Tpc- Assuming the 
inequality holds for (jpc, we then show how to pass to the limit M oo. Second, in order to 
show the inequality for crpc, we approximate the solution u{-,t) by a piecewise interpolant on an 
equispaced A/'-point grid. This step allows to break up the solution into localized pulses that interact 
with one discontinuity of the medium at a time. Third, we recall the formulation of reflection and 
transmission of pulses at interfaces of a piecewise constant medium, and simplify it using a model 
in which cancellations are absent. This simplification provides an upper bound for a particular 
weighted norm of the wavefield. Fourth, we present a recursive bump splitting procedure 
for handling the exponential number of scattering events and homogenizing the corresponding 
traveltimes. Fifth, we quantify the growth of the weighted norm in terms of combinations of 
reflection and transmission coefficients. In particular, sums of reflection coefficients are linked back 
to the total variation of \oga{x). 

Let us tackle these points in order. 

1. The result of adaptive L°° approximation oi BV functions by piecewise constants in one 
dimension is due to Kahane and goes as follows. For a given partition {xj;j = 0, . . . , M} of 
[0, 1], where xq = and xm = 1, we define an M-term approximant as 

M 

^pc(^) = XI '^jA^j-uxi){x). (22) 
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Denote by Sjv/ the set of all such approximants, i.e., all the possible choices of a partition 
{xj} and coefficients aj. Then Kahane's inequality states that 

r II II Var((T) 

There exists at least one approximant dpc that reaches this bound, and that also satisfies 
Var(cjpc) ^ Var((j). Sec the nice review article [26] by Ron DeVore for a proof. 

We saw earlier in equation (19) that the solution depends continuously on a. The same bound 
holds, trivially, if we use the weaker norm for u: 

\\U1 - U2\\LfLl ^ C{T,Uo,Ui) ■ \\ai - (T2IIL00, 

This inequality provides a way of passing to the limit M — ^ 00 in (21), provided the constant 
D in (21) is shown to be uniform in M. 

2. Let us now decompose the initial conditions into localized bumps up to a controlled error. 
Since we need uq G //-"^(0, 1) and ui G L^(0, 1) for consistency with the basic theory, we 
can approximate both uq and Ui = J'^ ui by piecewise linear interpolants built from equis- 
paced samples. The Bramble-Hilbert lemma handles the question of accuracy of polynomial 
interpolants and is for instance well covered in basic texts on finite elements.^ 

Lemma 2. (Bramble-Hilbert) Let v G -f^^(0, 1). For each positive integer N, we let h = 1/N 
and 

N 

Vh{x) = ^v{ih)ipi^h{x), 

i=0 

where ipi^h o,f^ ihe "tent" interpolating functions which were defined as 

Vo,n{x) = (1 - Nx)xo^x^h{x), 'Pn,n{x) = (1 + N{x - l))xi-h<x^i{x) , 

"Pj^Nix) = (1 + N{x - jh))x{j-i)h^x^h{x) + (1 - ]^{x - jh))xjh<x^{j+i)h{x), 
for 1 ^ j ^ N - 1. Then 

11^ ~ 'I'/illifi ^ C ■ ll^^lliji • h, and \\v — Vh\\L'^ ^ C ■ \\v\\h'^ ■ h^- 



Consider now the solution Ufi{-.t) of (20) with the piecewise linear interpolants 77,9, ^ and Ui^h 
substituted for the initial conditions uq and Ui. The growth estimate (18) provides a way to 
control the discrepancy {uh — u){-,t). With denoting L°°{0,T), we have 

^ C(r) • (lino - uo^hWm + ll^^i - U^,h\\m) by (18) 
^ C{T) ■ (||no||_f/i + ||t^i||_f/i) ■ h by Lemma 2, 

which tends to zero as /i — 0. 

^Tent elements are only a mathematical tool in this section, they arc not used in the numerical method. Also note 
that we are not considering a full discretization here; only the initial conditions are modified. 
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It is sufficient to argue that (21) holds for locahzed initial conditions of the type (pi^h{x). 
Indeed, if we let mo,/j = ^ iio,i'/7i,/i(x), and denote by ipi^h{x,t) the solution of the wave 
equation with (pi^h^x) in place of uq and for Ui, then 




The last inequality is a simple — ii equivalence property for piecewise affine functions on 
equispaced grids. The corresponding inequality for u{x, t) and uq then follows after taking 
the limit h ^ 0, provided we show that D is independent of h. 

Next, we explain how to control the norm of traveling bump solutions Uh{-,t). 

3. Equation (20) has an explicit solution when ape is piecewise constant. Let us start by re- 
hearsing the textbook case of a medium with impedance ai for x < xq, and (72 7^ cri for 
X ^ Xq. Assume that both uq and Ui = ui are supported on {x : x < 0}, and form the 
linear combinations 

f{x) = uo{x) - aiUi{x), g{x) = uo{x) + aiUi{x). 

For small times, / will give rise to right-going waves and g to left-going waves. Without loss 
of generality, let us assume g = 0. Then the solution is given by 

( .^_{ f{x-t/'yi) + Rf{2xQ-x-t/(Ti) ifx^xo; 
«^X,tj-| ifx>xo. 

In order for both u and |j to be continuous at x = 0, we need to impose that the reflection 
and transmission coefficients be determined as 

^ _ 1 - CT2/ai ^ _ 2 



l + (J2/cri' l + cr2/cri' 

Note that the situation of a wave, initially supported on {x : x > 0}, and reflecting at the 
interface from the right, is entirely analogous. The reflection and transmission coeflicients 
would be obtained by interchanging a\ and a2- Let us call the situation described by (24) a 
single scattering event. 

In order to study the growth of /" |u(-, t)\dx^ it is important to remove the cancellations that 
occur in (24) when i? < 0. For this purpose, decouple wavefields as 

/ //(a^-*M) ifx^xo; 

ui{x,t) - I ^^^^^ ^ ^^^^^^ ^ ^ yi^) 

«2(x,i) = | (26) 

[0 if X > Xq. 

We have u = ui + U2, but the point of introducing the couple {ui,U2) is that there is one 
particular weighted norm that never decreases in time, namely the L^3/2 norm defined as 

|||i;|||:=/ \v{x)\{a{x)f/^dx, |||(^^i,^^2)||| := ||h||| + 
Jo 
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Indeed, |||'u(-, 0)||| = |||/||| in ai, and it is straightforward to write 



/ — t/cFl / \ 1/2 i-OO 

-oo J-t/ai 



Since i?^ + = 1 (conservation of energy), taking the square root of each term in this 



\ 1/2 

R\+[—\ T^l. 



CTi 

convex combination yields 

/ \l/2 

It is convenient to write T = l^j T. Upper and lower bounds for the L^3/2 norm of the 
couple ("^1,1x2) follow: 

lll/IIKIIK(-,i)lll + IIK(-,t)||K[|i?|+T]. Ill/Ill. (27) 

4. We can treat the more general case of a piecewise constant medium by decomposing the initial 
conditions uq and Ui into a collection of small bumps, following the preceding discussion of 
a single interface. Fix a partition {xj : ^ j ^ M} of [0, 1] defining an M-term approximant 
fjpc, such that (Tpc{x) = aj when xj^i < x ^ Xj for j ^ 1. Let us choose the parameter h in 
the construction of the tent functions ipi^h small enough that scattered bumps intersect with 
at most one discontinuity of a^c at a time. Since the minimum and maximum traveling speed 
are 1 /a^ax. and 1 / (Tmin respectively, it suffices to take 

h = mm \x-j-i — xA (28) 

Let us generically call if{x) such a bump, and assume that it is supported on {xj-i,Xj) for 
some ^ j ^ M. It gives rise to left- and right-going waves. 

• Consider ip as a right-going initial condition of the wave equation; namely, uq = ip/2 and 
Ul = —ip/ (2(Tj). Choose a time t* large enough that the first term ip{x — t*/aj) vanishes 
in (24), but small enough that no other scattering event than the one at Xj has taken 
place yet. Then the solution takes the form u = ui + U2, with 

ui{x,t*) = [nj,rV]{x + t*/aj), U2{x,f) = [Tj^Mix - t*/aj+i), (29) 

where we have introduced reflection and transmission operators 

['^j,Mix) = Rj,rV{2Xj - x), [Tj^r^]{x) = Tj^rV{xj + ^^(x - Xj)), 

with 

^ _ l-(jj+i/aj ^ 2 

l + Uj+l/Gj' l + Uj+l/Gj- 

The subscript r refers to the fact that the pulse came from the right. 

• If instead ip had been chosen to correspond to a left-going bump in (xj-\,Xj), then we 
would have had 

ui{xX) = [T^j/vlix-f/aj), U2{x,t*) = [Tj^rV'Kx + f/aj+i), 
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where now 



and 

Rij = z — ■ — - — -V-^, 



_ 1 - <^j-i/<^j 



In both cases, the original bump disappears at t = t* and give rise to a couple {ui,U2)- In 
the regime of multiple scattering when t > f* , bumps are recursively split and removed, using 
the above characterization. For instance, at time t*, wc can restart the wave equation from 
the left-going bump [JZj^rf]{x + t* /aj) and submit it to scattering at xj-i; and independently 
consider the right-going bump [Tj^rflix — t*/aj+i) and its scattering at Xj+i. Applying this 
procedure recursively after each scattering event, a binary tree in space-time is created, whose 
nodes are the scattering events and whose edges are the broken bicharacteristics. 

We therefore consider a collection of wavefields, i.e., an element u G ^(L^j/j) where B is 
an unordered set of bumps, equipped with the norm |||u||| = HKftlH) and such that the 
solution of the wave equation is recovered as u = X^^Ufe. Obviously, |||'u||| ^ |||u|||. 

The upper and lower bounds (27) on the ^^3/2 norm of a singly scattered wavefield can be 
applied recursively to address the multiple scattering situation. Every reflected bump picks 
up a factor Rj i or Rj^r as appropriate, and similarly every transmitted bump picks up a 
factor Tj^r = y^aj^i/aj Tj^r or Tj^£ = ^ aj-i/cTj Tj^£ as appropriate. 

For fixed time t, evaluating the number of scatterings that have taken place is an overwhelming 
combinatorial task. Instead, let = amin = minj aj be a lower bound on the total time it 
takes a nonreflecting bump to travel the interval (0, 1). 

Consider now the case of periodic boundary conditions. Dirichlct and Neumann boundary 
conditions will be treated in point 6 below. Define the extended medium 



(Tpc(O) if x ^ -1; 

o'pdx + 1) if — 1 < X ^ 0; 

C7pc(x) if < x ^ 1; 

(^pc{x — 1) if 1 < a; ^ 2; 

(7pc(l) if a; ^2. 



With the same initial conditions supported inside [0, 1] , any scattering that takes place in Upc 
within the time interval [0, t"] would also take place in cjcxt within the same time interval. 
Since by equation (27) the LK^^,^ norm |||u(-,t)||| always increases during and after scattering, 
it is safe to bound |||u(-,t)||| for t hy the L^3/2 norm of the wavefield in the medium cjext 
for times t ^ t", in particular t ^ 00. (This reasoning is why we needed the lower bound in 
(27), hence the introduction of the special weighted norm.) 

5. Finally, let us now bound the L^3/2 norm of the recursively split wavefield u in the medium 
(Text) and show that it converges to a bounded limit when t — 00. 

Again, assume without loss of generality that the initial bump f is right-going, and supported 
in some interval [xk-i,Xk] C [0, 1]. We extend the partition {xj} of [0, 1] into the partition 
{xj — 1} U {xj} U {xj + 1} of [—1, 2], indexed by the single parameter — M + 1 ^ j ^ 2M in 
the obvious way. 

We can identify various contributions to the bound on |||u|||: 
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• The fully transmitted bump, with magnitude 11 ^ '^j,r ^ 1- 

• The bumps undergoing one reflection, with combined magnitude less than 

k+M-l M-l 
i\=k k^jiKii j2<ii i=0 

• The bumps undergoing two reflections, with combined magnitude less than 

fe+M-l ii-l M-l M-l 

ii=k 12=11— M k^i<ii 32<ii j3>i2 i=0 i=0 

etc. 

• The bumps undergoing 2n reflections, with combined magnitude less than 

(M-l \ /M-l \ 

Ei^dj ■n2\R^,r\] ■ 

Sums of reflection coefficients can be related to the total variation of log a; by making use of 
the identity 

1,, 

— — i^-logx, x>0, 
+ 2 

we easily get 

M-l ^ M-l ^ 

i=0 i=0 

The same bound holds for ^ |-Ri,r|- The bound for |||u||| is a geometric series with sufficient 
convergence criterion 

Var(logo-) < 2, 

and value 

1 

u ^ 



1 — ^Var(log(7) 

For times beyond t", of the form t ^ nt" for some integer n, this construction can be iterated 
and the factor [1 — ^Var(log cr)]~-^ needs to be put to the power n. 

We have taken to be a right-going bump so far, but if instead we consider general initial 
conditions uq and Ui over the same support, then we should form two one-way bumps as 
V± = UQzizajUi in the interval [ ]. Wc can now 1) go back to u through |||u||| ^ |||u|||, 

2) pass to the limits M — > oo, /i — > 0, and 3) use the equivalence of and -^^3/2 norms to 
gather the final bound as 

H-Ml^ ^ I /^"-^"^'l' (II^oIIli + ¥UA\l^\ when t ^ ntK 
(1 - iVar(logcr)) 

6. We now return to the case of Dirichlet or Neumann boundary conditions. Bumps meeting 
X = or X = 1 reflect back inside [0, 1] with no modification in the or -^^3/2 norm. 
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The extended medium giving rise to equivalent dynamics should then be defined by mirror 
extension instead of periodization, as 



f^pc(l) 


if X ^ - 


1; 




if-1 < 






if < .T 


^ 1; 


cjpc(2 - x) 


if 1 < X 




CTpc(O) 


if X ^ 2 





The reasoning proceeds as previously in this extended medium. The discontinuities of (Text (a;) 
are the points {xj; —M + 1 ^ j ^ 2M} of [—1, 2] defined as the proper reindexing of {—Xj} U 
{xj} U {2 - Xj}. 

If we define Rj^^ and Rj^r as the reflection coefficients at x = Xj, — M + 1 ^ j ^ 2M, then 
the study of combined amplitudes of reflected bumps involves the quantities 

k+M-l k+M-1 

^ Ri^e, and ^ Ri^r- 

i=k i=k 

Because the extension is now mirror instead of periodic, each of these sums may involve 
a given reflection coefficient, Rj^£ or Rj^r, twice within a span of length M of the index j. 
Therefore we can only bound each sum individually by Var(logc7) instead of iVar(logc7) as 
previously. The reasoning continues exactly like before, with this loss of a factor 2. 

□ 

Let us make a few remarks. 

• For the application to the sparse recovery problem, the weighted LJ. norm is used instead. 
In the last steps of the proof we can use the equivalence of L^^3/2 and L^. norms to slightly 
modify the estimate into 

^ a{x)\u{x,t)\dx ^2(^^^^ ' '^"'(/ (^{x)\uoix)\dx + a^{x)\Ui{x)\dx^ . (30) 

• The constant in the estimate (21) depends on time only through the maximum number of 
rotations around the periodized interval [0,1]. That this constant does not tend to one as 
t — > O"*" is the expected behavior, because a single scattering event whereby a bump splits 
into two or more bumps can happen arbitrarily early. 

• On the other hand we do not known if the condition Var(log(T) < 1 (Dirichlet or Neumann), 
or Var(log(j) < 2 (periodic boundary condition) is essential for an estimate to hold. 
Any proof argument that would attempt at removing a condition of this kind — or explain 
its relevance — would need to account for the combinatorics of destructive interfererence that 
occurs in regimes of multiple scattering. This question may offer a clue into localization 
phenomena. 

• The reader may wonder why we have only included initial conditions and no forcing to 
the wave equation, as is customary in Strichartz estimates. The presence of an additional 
forcing F{x,t) in equation (1), however, would spoil sparsity for most choices of F. Only 
well-chosen forcings, properly localized and polarized along bicharacteristics relative to the 
initial conditions, have any hope of preserving the peaky character of a solution to the wave 
equation — otherwise energy would be introduced and distributed among too large a set of 
bicharacteristics . 
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• Lastly, an estimate such as (21) would not generally hold for media that are not of bounded 
variation. This phenomenon can be illustrated in the small e limit of a slab of random 
acoustic medium with correlation length 0{e^), slab width 0(1), and impinging pulse width 
0{e). This situation is considered in Chapter 9 of [37], where it is also shown that the intensity 

of the reflected wave decays like in expectation, hence 1/t for the wave amplitude. The 
corresponding picture at fixed t is that of a heavy-tailed wave that decays like 1/x — hence 
does not belong to . 



3.2 Analysis of Incoherence 

In this section we prove a result of extension, or incoherence of the eigenfunctions of the operator 
CF~'^{x)cP /dx^ on the interval [0, 1], with Dirichlet (u(0) = u{l) = 0) or Neumann (■u'(O) = u'{l) = 0) 
boundary conditions. Recall that the natural inner product in this context is 



{f,9)=f f{x)g{x)a'^{x)dx, 
Jo 



with corresponding norm 



Theorem 6. Let logcr G SF([0, 1]), and let Vu;{x) obey v'^{x) = —uj^a^{x)v^{x) on [0,1] with 
Dirichlet or Neumann boundary conditions. Then 



\\o'Vu,\\l'^ ^ \/2 exp (Var(log(T)) • ||fa)||i2 . (31) 

The point of this result is that the quantity exp (Var(log cr)) docs not depend on lo. 

Proof. Let us first discuss existence and smoothness of Vi^{x) in [0,1]. The operator a~'^{x)-^ 
with Dirichlet or Neumann boundary conditions is not only self-adjoint but also negative semi- 
definite with respect to the weighted inner product ■ Hence by spectral theory there exists 

a sequence of eigenvalues ^ Ai < A2 < . . . and corresponding eigenvectors in L^2, orthonormal 
for the same inner product. (Basic material on Sturm-Liouville equations and spectral theory in 
Hilbert spaces can be found in [23].) We denote a generic eigenvalue as A = o;^, and write 

v'!o{x) = -uj^a^{x)vuj{x). 

This equation in turn implies that G ^'^^^i^, 1), hence also belongs to L^(0, 1), i.e. v^^ is in the 
Sobolev space -ff^(0, 1). Iterating this regularity argument one more time, we can further conclude 
that is in the space of functions whose second derivative is in BV . 

Let us now fix w, remove it as a subscript for notational convenience, and consider the quantity 

I{x) = \v{x)\^+ 



(jo'^a'^{x) 

Take cr G for the time being, and notice that a few terms cancel out in the expression of I'{x); 

/'(x) = -2(lcg.(x))'^^ 

We can now bound 

l'{x)^-2\{\oga{x))'\I{x), 
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and use Gronwall's inequality to obtain a useful intermediate result on the decay of I{x), 

/(x) ^/(O)exp {-2 j^^ \{\oga{y))'\dy^ . (32) 

In general, cr{x) is only of bounded variation, but the inequality (32) remains true if written as 

lix) ^ 1(0) exp (-2Var,(log a)) , (33) 

(In fact, a slitghly stronger result with the positive and negative variations of log(c7) holds.) For 
conciseness we justify this result in the Appendix. 

The quantity a'^{x)I{x) is a continous function over [0, 1], therefore absolutely continuous, and 
reaches its maximum at some point x*. No special role is played by the origin in the estimate (33), 
hence, for all x G [0, 1], we have 

max\a{x)v{x)f < a'^{x*)I{x*) ^ exp {2Yaj:{loga)) a^{x)I{x). 
Integrate over [0, 1] ; 

WcrvWl^ ^ exp (2Var(logc7)) ||/||^2 . 



Now 



r a\x)I{x)dx = r a^xMx)]"^ dx + ^^^^ dx. 
Jo Jo Jo ^ 



It is easy to verify that both terms in the right-hand side of the above equation are in fact equal 
to each other, by multiplying the equation v" -\-LtP'a'^v = with v, integrating by parts over [0, 1], 
and using the boundary conditions. Therefore 



\\(Tv\\^ ^ 2 exp (2Var(logc7)) \\v\\^2 , 

which is the desired result. 

□ 

A few remarks on related work and extensions are in order. 
• The argument can be slightly modified to obtain instead 

'\Vuj\\l^ 



t2 



I'd; cx) 

exp (Var(log cr)) 



The quantity I{x) that appears in the proof is, morally, the time-harmonic counterpart of the 
sideways energy E{x) = + iffH where u would now solve (20). Sideways 

refers to the fact that integration is carried out in t instead of x. This trick of interchanging 
t and X while keeping the nature of the equation unchanged is only available in one spatial 
dimension. It was recognized in the 1980s by W. Symes that "sideways energy estimates" 
allowed to prove transparency of waves in one-dimensional BV media [56, 40], a result very 
close in spirit to the eigenfunction result presented here. Independently, E. Zuazua [61], as 
well as F. Conrad, J. Leblond, and J. P. Marmorat [21], used similar techniques for proving 
controllability and observability results for waves in one-dimensional BV media. See [62] for 
a nice review. 
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• Theorem 6 is sharp in the sense that for each < s < 1, there exists a medium cr{x) in the 
Holder space C*([0, 1]) for which there exists a sequence of eigenfunctions exponentially and 
arbitrarily localized around, say, the origin. Notice the embedding C BV, but C BV 
for s < 1. The construction is due to C. Castro and E. Zuazua, see [16]. In our setting, it 
means that (31) cannot hold for such a{x), because the constant in the right-hand side would 
have to depend on lu. 

• Related results in dimension two and higher, using local norms on manifolds with metric of 
limited differentiability, e.g. C^'^, can be found in [52, 53, 39]. Interestingly, the constant 
in front of the Lf^^ norm in general grows like a fractional power law in A, allowing the 
possibility of somewhat localized eigenfunctions in dimensions greater than two, typically 
near the boundaries of the domain. 

• Physically, one may relate the total variation of log a to a notion of localization length L, for 
instance as the largest x such that Var2;(logcj) is less than some prescribed constant C. Then 
L dictates the decay rate of eigenfunctions, much in the spirit of Lyapunov exponents for the 
study of localization for ergodic Schrodinger operators. 



3.3 Analysis of Eigenvalue Gaps 

This section contains the eigenvalue gap result. Notice that it is the square root of the eigen- 
values which obey a uniform gap estimate. 

Theorem 7. Let log cr G -BT^([0, 1]) with Var(log(j) < vr. Let Xj = —^j, j = 1,2, be two distinct 
eigenvalues of (x) (f / dx^ on [0,1] with Dirichlet or Neumann boundary conditions. Then 

TT- Var(logo-) < |^_^ _ < 7r + Var(logcr) ^^^^ 
Jla{x)dx ^ ^ /o(T(x)dx 

Proof. Consider a generic eigenvalue A = —uP with eigenfunction u{x), and assume that a G 
C^([0, 1]). This restriction will be lifted by a proper limiting argument. 

The quantity I{x) introduced in the proof of Theorem 6, should be seen as the square of the 
radius r(x), in a polar decomposition 

u'{x) = uu{x)r{x)cosO{x), u{x) = r{x)sm9{x). 

If u' 7^ 0, then tan 6 = coa^; and if = 0, then cot & = (We have already seen that u and u' 

cannot simultaneously vanish since r^(x) = L{x) > everywhere.) Prom either of those relations 
one readily obtains 

0'{x) = (log(j(x))' sme{x)cose{x)+coa{x). (35) 

It is interesting to notice that the radius r{x) does not feed back into this first-order equation for 
6{x). Note also that the second term in the above equation quickly dominates as a; — oo; if the 
nonlinear term is neglected we get back the WKB approximation. The boundary conditions on 6 
are: 

Dirichlet: 6(0) = rmr, 0(1) = rnr; 

TT TT 

Neumann: ^(0) = — + mir, 6{1) = — + mr. 

where m and n are arbitrary integers. Without loss of generality, set m = 0. There is also a 
symmetry under sign reversal of both 9 and lo, so we may restrict n ^ 0. 
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Equation (35) with fixed 9(0) is an evolution problem whose solution is unique and depends 
continuously on w. Moreover, the solution is strictly increasing in ui, for every x, as can be shown 
from differentiating (35) in lo and solving for dO/du using Duhamel's formula. 

The successive values of co that correspond to eigenvalues A = — u;^ are therefore determined 
by the (quantization) condition that 9(1) = nvr for some n > (Dirichlct), or 9{1) = 7r/2 + nvr for 
some n ^ (Neumann). By monotonicity of ^(1) in u), there is in fact a bijective correspondence 
between n and to. 

As a result, two distinct eigenvalues Xj = —oJ^, j = 1,2, necessarily correspond to a phase shift 
of at least tt at x = 1. Set 9j{x) for the corresponding phases. Then, by (35), 

{9i — 92)' (x) = (log cr)'(x) [sin ^1 cos^i — sin ^2 008^2] + {(^1 — u!2)o'{x). 

Integrate in x and use the boundary conditions to find 

/ (logay{x)[sm9iCOs9i — sm92COs92]dx + {LOi — u>2) / a{x)dx, n^O. 
Jo Jo 

The factor in square brackets is bounded by 1 in magnitude, therefore 

1 

- ui2\ ^ 71— ——(tt - / |(logc7)'(a;)|dx), 
Jq a{x)dx Jo 

and also 

1 /-i 
1^1-^2! ^ „i , , , (7^+ / |(logo-)'(x)|(ix), 
Jq a{x)dx Jo 

This proves the theorem in the case when a £ {[0,1]). A standard limiting argument shows 
that the properly modified conclusion holds when a € BV{[0, 1]); we leave this justification to the 
Appendix. 

□ 

A few remarks: 



• The polar decomposition used in the proof of Theorem 7 is a variant of the so-called Priifer 
transformation [1], which is more often written as u'{x) = r{x) cos9{x), u{x) = r{x) sin9{x). 
This simpler form does not appear to be appropriate in our context, however. Notice that 
such polar decompositions are a central tool in the study of waves in random media [37] . 

• It is perhaps interesting to notice that Var(logcr) < tt is a sufficient condition identified by 
Atkinson in [1] for the convergence of the Bremmer series for the Sturm-Liouville problem. 



4 Algorithms 

This section discusses some of the finer points of the implementation. Basic discretization issues 
were exposed in Section 2.2. 
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4.1 Extraction of the Eigenvectors 

The first part of the algorithm consists in extracting a random set of eigenvectors {vcj}cj of 
the discretized operator C = T,~^L where S = diagj{a\j]). The discretized Laplacian L over M is 
computed spectrally 

Lv[m] = — 47r^m^{)[m] 

where m G {—N/2 + 1, . . . , N/2} indexes the frequencies of the discrete Fourier transform. Fourier 
transforms are computed in 0{N log N) operation with the FFT. 

Since we arc interested in extracting only a few eigenvectors chosen at random, we use an 
iterative method [2] that parallelizes trivially on multiple processors. Each processor computes and 
stores independently from the others a few eigenvectors using an iterative method. 

The simplest way to compute an eigenvector v^, whose eigenvalue —u^ is closest to a given — 
is to compute iterative inverse powers 

with an adequate starting guess typically white noise. In practice we use a variant of this 
power iteration called the restarted Arnoldi method, and coded in Matlab's eigs command. At 
each iteration we approximately solve the linear system (£ + Ugldjij^'^^^ = v^^ with a few steps of 
stabilized bi-conjugate gradient [5]. Recent work [34] suggests that a shift in the reverse direction 
C — WqH or a complex shift C + icDgld are good preconditionners for this linear system resolution. 
Such preconditoners are applied efficiently using multigrid, or alternatively and as used in this 
paper, using discrete symbol calculus. In this framework, it is the whole symbol of the operator 
{£. — CoQld)~^ which is precomputed in compressed form, and then applied iteratively to functions 
on demand. The resulting preconditioners are quite competitive. See [24] for more information. 

Each shift should be chosen according to an estimate of the true (but unknown) eigenvalues 
repartition to sample as uniformly as possible the set of eigenvectors of JC. The eigenvalues of 
the discrete Laplacian in a constant medium a[j] = ctq are {— a;^ax(2'7i/-^)^}^=_7V/2+i where 
"^max — TT^A^^/fiQ. Treating the general case as a perturbation of this constant setting leads draw 
lDq uniformly at random in [0, cUmax] where ajmax defined as the maximum eigenvalue of C. The 
value of Wmax is readily available and computed using power iterations on C We have seen in 
Section 3.3 that the departure from uniformity is under control when the medium has a reasonable 
total variation. We also explained that the sampling should be without replacement: in practice 
the implementation of "replacement" carefully accounts for the multiplicity two of each eigenspace 
in the case of periodic boundary conditions. 

4.2 Iterative Thresholding for £^ Minimization 

At the core of the compressive wave computation algorithm is the resolution of the optimization 
problem (14) involving the norm. We introduce the operator $ : i— > Mp such that 

^u[lv] = ^tx[j]^ci[j] 

j 

where u e fl indexes K = |Q| eigenvectors {va,}a> of the discretized Laplacian and S = diagj (c7[j]). 
Discarding the time dependency, the optimization (14) is re- written in Lagrangian form as 

min - cf + a[j]|ub1l- (36) 

j 
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The Lagrangian parameter A should be set so that ||<I>S^u — c|| ^ e. 

As described in Section 2.2, e account for the discretization error, and it can also reflects errors 
in computation of the eigenvectors. This quantity can be difficult to estimate precisely, and it can 
be slightly over-estimated, which increases the sparsity of the computed approximation. 

Iterative algorithms solves the minimization (36) by sequentially applying a gradient descent 
step to minimize |^>S^tt — c|| and a soft thresholding to impose that the solution has a low weighted 
norm cr[i]|wb]|- This algorithm was proposed independently by several researcher, see for 
instance [22, 20, 35], and its convergence is proved in [22, 20]. 

The steps of the algorithm are detailed in Table 1. They correspond to the application of the 
iterative thresholding algorithm to compute the iterates Yiu'^^^ with the measurement matrix 
Since this matrix satisfies I^Snf ^ ||u|| by Plancherel, these iterates converge to a minimizer of 
(36). 

Since the correspondence between e and A is a priori unknown, A is modified iteratively at step 
4 of the algorithm so that the residual error converges to e, as detailed in [17]. 

An important feature of the iterative algorithm detailed in Table 1 is that it parallelizes nicely 
on clusters where the set of eigenvectors {vcd}^^^^ are distributed among several nodes. In this case, 
the transposed operator <I>* is pre-computed on the set of nodes, and the application of and 
is done in parallel during the iterations. 

The iterative thresholding algorithm presented in Table 1 might not be the fastest way to solve 
(36). Recent contributions to sparse optimization include for instance primal-dual schemes [59], 
gradient pursuit [7], gradient projection [36], fixed point continuation [31], gradient methods [46], 
Bregman iterations [58] and greedy pursuits [45]. These methods could potentially improve the 
speed of our algorithm, although it is still unclear which method should be preferred in practice. 

Another avenue for improvement is the replacement of the norm by non-convex functionals 
that favor more strongly the sparsity of the solution. Non-convex optimization methods such as 
FOCUSS [38], re- weighted [15] or morphological component analysis with hard thresholding 
[54] can lead to a sub-optimal local minimum, but seem to improve over minimization in some 
practical situations. 
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1 



Initialization: set u^^^ = and k = 0. 



2 



Update of the solution: compute a step of descent of ||$S^n — c| 




3 



Minimize £ 



norm: threshold the current update 



where the soft thresholding operator is defined as 




if |a| < A, 

a — sign(Q;)A otherwise. 



4, 



Update the Lagrange multiplier: set 



A 



e 



||$S2u - C| 



5 



Stop: while — u^'^^j > tol, set A; A; + 1 and go back to 2. 



Table 1: Iterative thresholding algorithm to solve (36). 



4.3 Sparsity Enhancement 

The success of the compressive method for wave propagation is directly linked to the sparsity 
of the initial conditions uq and ui. To enhance the performance for a fixed set of eigenvectors, 
the initial data can be decomposed as uq = 'YliZo'^o where each of the L components {uq}^ is 
sufficiently sparse, and similarly for ui. The algorithm is then performed L times with each initial 
condition Uq and the solution is then recomposed by linearity. It would be interesting to quantify 
the slight loss in the probability of success since L simulations are now required to be performed 
accurately, using the same set of eigenvectors. 

Since the solution might become less sparse with time t increasing, one can also split the time 
domain into intervals [0,t] = \J^[ti,ti+i], over each of which the loss of sparsity is under control. 
The algorithm is restarted over each interval [tj, ij+i] using a decomposition of the wavefields at 
time ti into a well chosen number L = Lt^ of components to generate sparse new initial conditions. 

5 Numerical Experiments 

5.1 Compressive Propagation Experiments 

We perform simulations on a ID grid of AT = 2048 points, with impedance a{x) of various 
smoothness and contrast o-jnax/o"inin- The result of the compressive wave computation is an ap- 
proximate discrete solution {u\j]{t)}^~Q at a fixed time step t of the exact solution 
the discretized wave equation. 

The performance of the algorithm is evaluated using the recovery error in space and at several 
time steps ti = Ti/ut for i = 0, . . . , nt — 1 uniformly distributed in [0, T], where rij = 100. The final 
time is evaluated such that Jq o' = 1, so that the initial spike at t = propagates over the whole 
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domain. This error is averaged among a large number of random sets f2 G i^x of K eigenvectors 

-. nt-lN-l 

Erria,K/Nf = E E E l^bK*.) " miU)\'. (37) 

Each set Q G is drawn at random using the procedure described in Section 4.1. 

This error depends on the sub-samphng factor K/N, where K = \U\ is the number of computed 
eigenvectors, and on the impedance a of the medium. Numerical evaluation of the decay of Err with 
K are performed for two toy models of acoustic media that could be relevant in seismic settings: 
smooth a with an increasing number of oscillations, and piecewise smooth a with an increasing 
number of discontinuities. For each test, the initial condition uq is a narrow gaussian bump of 
standard deviation 7/N. 

Smooth oscillating medium. A uniformly smooth impedance parameterized by the number 
of oscillations 7 G [1,20] is defined as 

^7b1 = ^^^^ + ^^^^ (sin(2;r7i/iV) + 3) . (38) 

The contrast (Tmax/cmin = cTinax ^ [1, 10] is also increased linearly with the complexity 7 of the 
medium, according to 1 + y|j(7 — 1). 

Figure 1 shows how the recovery error Eii^a^, K/N) scales with complexity 7 of the medium 
and the number of eigenvectors K. For media with moderate complexity, one can compute an 
accurate solution with N/10 to eigenvectors. 



Piecewise smooth medium. Jump discontinuities in the impedance a reflect the propagating 
spikes and thus deteriorate the sparsity of the solution when t increases, as shown on Figure 2, 
top row. Figure 2 shows that compressive wave computation is able to recover the position of the 
spikes with roughly N/5 to iV/4 eigenvectors. 

To quantify more precisely the recovery performance, we consider a family of piecewise smooth 
media parameterized by its number of step discontinuities 7 G [1, 20]. The contrast (Tmax/fmin S 
[1, 10] is also increased linearly with the complexity 7 of the medium. The discontinuities are 
uniformly spread over the spatial domain [0,1]. The piecewise smooth impedance cr^ is slightly 
regularized by a convolution against a Gaussian kernel of standard deviation 5/N. This tends to 
deteriorate the sparsity of the solution when t increases, but helps to avoid numerical dispersion 
due to the discretization of the Laplacian. Figure 3 shows how the recovery error Err((T-y, K/N) 
scales with complexity of the medium and the number K of eigenvectors. 



5.2 Compressive Reverse-Time Migration 

In this example, we consider an idealized version of the inverse problem of reflection seismol- 
ogy, where wavefield measurements at receivers are used to recover some features of the unknown 
impedance a{x). Our aim is to show that compressive wave computation offers significant mem- 
ory savings in the implementation of a specific adjoint-state formulation that we call snapshot 
reverse-time migration. 



Review of ID reverse-time migration. We will assume a one-dimensional setup as in the rest 
of this paper, i.e., 

9, ,d'^u du ^ 
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Figure 1: Compressive wave propagation in a smooth medium. Top row: recovery error decay 
logiQ(Err(cr'y, K/N)) as a function of the sub-sampling K/N for various complexity 7 of the medium. 
Each black dots corresponds to the error of a given random set O (the red curve is the result of 
the averaging among these sets). Bottom row: 2D display of the error \ogiQ{/Eii {a ^, K/N)) as a 
function of both K/N (horizontal axis) and 7 (vertical axis). 

with a known localized initial condition m(j;,0) = uq{x), du/dt{x,0) = ui{x), and over a domain 
sufficiently large that the choice of boundary conditions does not matter. The x coordinate plays 
the role of depth. As in classical treatments of reflection seismology, the squared impedance is 
perturbed about a smooth, known reference medium 

o-^(x) = a^ix) +r{x), 

where the high-frequency perturbation r(x) has the interpretation of "reflectors". The wave equa- 
tion is then linearized as 



CJo(x) 



-r(x)- 



^mc 



(40) 



dt'^ dx"^ dt'^ ' 

where the incoming field lij^c solves the wave equation in the unperturbed medium ctq. An im- 
portant part of the seismic inversion problem — the "linearized problem" — is to recover r{x) from 
some partial knowledge of u{x,t), interpreted as solving (40). We will assume the availability of 
snapshot data, i.e., the value of u and du/dt at some fixed time T, 



{di{x),d2{x)) 



fill 

{u{x,T),—{x,T)), 
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Figure 2: Examples of approximate solution u[j]{t) for a piecewise smooth medium. 

possibly restricted in space to a region where the waves are reflected, by opposition to transmitted. 
We then let -F[o"q] for the forward, or modeling operator, to be inverted: 

(*) = FW-ir. (41) 

A more realistic seismic setup would be to assume the knowledge of u{0, t) for positive t, but 
the step of going from (^1,^2) to u{0,t), or vice- versa, is a depth-to-time conversion that should 
present little numerical difficulty. While assuming trace data u{0, t) would give rise to an adjoint- 
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Figure 3: Compressive wave propagation in a piecewise smooth medium. 

state wave equation with a right-hand side, working with snapshot data has the advantage of casting 
the adjoint-state equation as a final-value problem without right-hand side. 

More precisely, a simple argument of integration by parts (reproduced in the Appendix) shows 
that the operator F in (41) is transposed as 



where q solves the adjoint-state equation 



0"n X 



,5^ 

dt^ 



}{x,t) 



dq 



mc 



5*2 



(x, t)dt, 



0, 



(42) 



(43) 



with final condition 



q{x,T) 



-d2{x), 



dq 
di 



ix,T) 



a^ix) 



di{x). 



(notice the swap of di and d2, and the minus sign.) 

In nice setups, the action of -F*[c7q], or F* for short, called imaging operator, is kinematically 
equivalent to that of the inverse F~^ in the sense that the singularities of r are in the same location 
'di' 



as those of F* 



d2 



. In other words the normal operator F*F is pseudodifferential. We also show 



in the Appendix that F* is the negative Frechet derivative of a misfit functional for snapshot data, 
with respect to the medium a'^, in the tradition of adjoint-state methods [48]. Hence applying the 
imaging operator is a useful component of solving the full inverse problem. 
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A standard timestepping method is adequate to compute (42); the adjoint wave equation is first 
solved until time t = 0, then both u and q are evolved together by stepping forward in time and 
accumulating terms in the quadrature of (42). However, this approach is not without problems: 

• The CFL condition restricting the time step for solving the wave equation is typically smaller 
than the time gridding needed for computing an accurate quadrature of (42). The potential 
of being able to perform larger, upscaled time steps is obvious. 

• The backward-then-forward technique just discussed assumes time-reversibility of the equa- 
tion in q (or conversely of the equation in u), a condition that is not always met in practice, 
notably when the wave equation comes with either absorbing boundary conditions or an ad- 
ditional viscoelastic term. For non-reversible equations, computation of (42) comes with a 
big memory overhead due to the fact that one equation is solved from t = to T, while 
the other one is solved from t = T to 0. One naive solution is to store the whole evolution; 
a more sophisticated approach involves using checkpoints [57], where memory is traded for 
CPU time, but still does not come close to the "working storage" in the time-reversible case. 
In this context, it would be doubly interesting to avoid or minimize the penalty associated 
with time stepping. 

Numerical validation of compressive reverse time migration. As a proof of concept, we 
now show how to perform snapshot reverse-time migration (RTM) without timestepping in the case 
of the reversible wave equation, on a ID grid of AT = 2048 points. The approach here is simply to 
compute independently each term of a quadrature of (42) using the compressive wave algorithm 
for u and q. We leave to a future project the question of dealing with 2D and 3D non-reversible 
examples, but we are confident that most of the ideas will carry through. 

The smooth medium is defined as in (38) with 7=1 (one oscillation) and a contrast 
'^max/'^min ~ '^'^^ reflectors r{x) is a sum of two Gaussian bumps of standard deviation 7/N 
and amplitude respectively —0.6 and 0.6, see figure 4, top row. 

We first compute the input di and ^2 of the RTM by computing the solution u{x, t) of the 
wave equation in the perturbed medium a"^ = + r, and then evaluate di{x) = u{x,T) and 
d2{x) = ^{x,T). The initial condition u{x,0) is a second derivative of a Gaussian of standard 
deviation 7/N. This simulates seismic observations at time t = T, see figure 4, top row. 

The algorithm proceeds by computing approximations q(x, ti) and u\nc{x, ti) of the forward and 
backward propagations q and -Uinc at Nf = N/10 equispaced times {ti}^Q^. These approximations 
are computed for each ti independently, without time stepping, by using the compressive algorithm 
with a small set ol K < N eigenvectors. The RTM estimation of the residual r{x) is obtained by 
discretizing (42) 

1 d'^u- 

^(^) = — q{x,ti)^^{x,ti), 

1=0 

where the derivative is computed using finite differences. 

The success of compressive RTM computations is measured using an error measure Err(A'/A^) 
obtained similarly to (37) by averaging over several randomizations for the sets G ilx of \ = K 
eigenvectors 

N-l 

^^^^^/^)' = Wm 5: Ei^~obi-rbii^ 

' " ' " " ne^K i=o 

where fo is the RTM estimation obtained with the full set of A^ eigenvectors. 
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Figure 4, bottom row, displays the decay of Err(i^/A^) with the number of eigenvectors used for 
the compressive computations. This shows that roughly 20% of eigenvectors are needed to reach 1 
digit of accuracy, and 30% to reach 2 digits of accuracy. 

Note that this simple RTM method cannot be expected to recover the original r{x) accurately, 
mainly because we have not undone the action of the normal operator F*F in the least-square 
treatment of the linearized problem. 

Also note that the compressive algorithm was run "as is", without any decomposition of the 
initial condition, or split of the time interval over which each simulation is run, as in Section 4.3. 
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Figure 4: Top row: perturbed speed a"^ and input di of the RTM computations. Middle row: 
reference solution rj^TM (^nd approximated solution rj^TM for K/N = 0.2. Bottom row: recovery 
error decay \ogiQ(Eii{K / N)) as a function of the sub-sampling K/N . Each black dot corresponds 
to the error of a given random set Q. (the red curve is the result of the averaging among these sets). 
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6 Discussion 



A number of questions still need to be addressed, from both a mathematical and a practical 
viewpoints. 

• Number of eigenvectors. While it is easy to check the accuracy of a compressive solver relative 
to a standard scheme, an important question is a posteriori validation without comparison 
to an expensive solver. Were enough eigenvectors sampled? One way to answer this question 
could be to compare simulation results to those of a coarse-grid standard finite difference 
scheme. Another possibility is to check for convergence in K, as the recovered solution 
stabilizes as more and more eigenvectors are added. To the best of our knowledge compressed 
sensing theory does not yet contain a theoretical understanding of a posteriori validation, 
though. 

• Two and three-dimensional media. Passing to a higher-dimensional setting will require using 
adequate bases for the sparsity of wavefields. Curvelets is one choice, and wave atoms is 
another one, as both systems were shown to provide to Ip boundedness for all p > 0, of 
the wave equation Green's functions in media [10, 51]. (Wavelets or ridgelets would only 
provide Ip to £i boundedness.) The sparsity question for wave equations in simple nonsmooth 
media, e.g. including one interface, is to our knowledge completely open. The question of 
incoherence of the eigenfunctions with respect to such systems will be posed scale-by-scale, 
in the spirit of wavelets vs. Fourier as in [18] for instance. Incoherence or extension questions 
for generalized eigenfunctions may also be complicated by the presence of infinite-dimensional 
eigenspaces in unbounded domains. 

• Parallel computing. An exciting outlook is that of scaling the compressive approach to clusters 
of computers where properly preconditioned and domain-decomposed Helmholtz equations 
are solved in an entirely parallel fashion over w. The i\ solver should be parallelized too, and 
for the iterative schemes considered the two bottleneck operations are those of applying the 
incomplete matrix of eigenvectors and its transpose. For reverse-time migration where several 
times need to be considered, an outstanding question is that of being able to reliably predict 
the location of large basis coefficients at time t from the knowledge of wavefields already solved 
for at neighboring times t — At and t + At. Note that the advocated "snapshot" variant of 
reverse-time migration in Section 5.2 contains a preliminary time-to-depth conversion step. 

• Other equations and random media. Any linear evolution equation that obeys interesting 
sparsity and incoherence properties can in principle be studied in the light of compressive 
computing. This includes the heat equation in simple media; and possibly also the (one- 
particle) Schrodinger equation in smooth potentials when a bound on wave numbers of the 
kind 1^1 < l//i is assumed, where h is Planck's constant. Interestingly, the compressive 
method also makes sense for mildly random media as the £i reconstruction empirically filters 
out the trailing "coda" oscillations when they follow behind a coherent wavefront. 

A Additional proofs 

Complement to the proof of Theorem 6. 

In order to justify (33), consider the quantity 

Ie{x) = \u{x)f + 
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where fT^ is adequately regularized in the sense of Lemma 1, but u(x) still solves the equation with 
a{x). Then by point 1 of Lemma 1, /^(x) makes sense, and 



I'eix) = (^^^^2(^^) 2Re(«'(x)u(x)) - 2(logae(x)) 



where 



Lo'^a'^{x) 

The inhomogeneous Gronwall inequality^ can be applied and yields 

I,{x) ^ 1,(0) exp (^-2^"|(logc7,(y))'|dy) - £F,{y)^W (^-2 \{loga,{z)y\dz^ dy, (44) 

-^eW = J {x)u{x)\. 

We can now invoke the different points of Lemma 1. Points 2 and 3, in conjunction with uniform 
boundedness of u and u' over [0, 1], allow to conclude that Fs{x) dx ^ C ■ e for some constant 
C. By points 4 and 5, the second term in the right-hand side of equation (44) tends pointwise to 

zero as e ^ 0. Hence by point 2 we have lim^^o Ie{x) = I{x); and the first term in the right-hand 
side of equation (44) is handled again by points 4 and 5. The inequality obtained in the limit is 

I{x) ^ 7(0) exp (-2Var^(log(7)) , 

which is what we sought to establish. 

Complement to the proof of Theorem 7. 

Let us show that (34) holds when a G BV{[0, 1]) and not just C^([0, 1]). Let be a regular- 
ization of a, in the usual sense. Define Osix) through 

. , , 1 u'(x) /, / X / N u(x) 

cotee{x) = TT^T' t&n ee{x} =ujae{x)—--, 

ujafr[x) u(x) u [x) 

where u{x) solves the Sturm-Liouville problem with the un-regularized a. Then a short calculation 
shows that 

^, , . crl(a;) . „ , . „ , . , s (y{x) sin^6'£(a;) 
Se{x) = smee{x)cose,{x)+uja{x) \ ' . 7 ( . 

(7£(x) o-e(x) sm''^(a;) 

The ratio of sine squared can be recast in terms of a familiar quantity: 

Sin^6'g(x) _ 1 + COt^6>g(x) _ 1+ a;V|(x)l^l^ _ lejx) 

sm^e(x) ~ l + cot26'(x) ~ 1+ .1, ~ I(x)' 

W^CT-'lxj I u(x) I 

The same result follows from a similar formula with cosines in case sin 9 = 0. Consider now two 
different frequencies loi, ijJ2, and integrate to get 

Oe,l{l) -Oe,2{^) = / [sin6'£,icos6'£,i -Sm9e,2COs9e,2]dx 

Jo <^e[X) 



®If ji(t) < fit) + g(t)y{t), then y(t) < 3/(0) exp [j^ g{s)ds^ + /(s)exp (^Jl g(T)dT^ ds, and vice-versa with ^ in 
both equations instead of ^. 
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The first term is bounded by \^^\dx, which tends to Var(log(7) as e — > by point 5 of Lemma 
1. The second integral can be compared to Jq a{x)dx by studying 



a—-^dx— [ adx\ = \ [ ^ {al^ — a^I) dx 

<^6 I Jo Jo 



^ / — -\a — ae\dx 



+ / -\Ie-I\dx. 

Jo 



All four factors a, I, and are bounded and bounded away from zero, by Theorem 6 for / and 
point 3 of Lemma 1 for and I^. Point 2 of Lemma 1 then implies that jcg — a\dx ^ 0. We 
have already argued in the proof of Theorem 6 that \I^ — I\dx 0. The limiting inequalities 
are (34), as desired. 

Proof of Proposition 4 

Let p = minp„, where n = 1, . . . ,N, and let py^^if = 1/N. The basic heuristic is that, even 
if the probability of drawing a given measurement decreases by a factor Pn/Punif which may be 
less than 1, this effect is at least more than offset by sampling a correspondingly larger number of 
measurements, namely K^nif Punif/p instead of -fCunif. 

First, let us see why the accuracy bound (8) still holds if we increase the probability for any 
particular a;„ to be included in the list ^k-^ In our context, assume for the moment that 

Pr(a;n e Qr) ^ K^nit Punif, (45) 

uniformly over n. Then we may reduce this setup to a situation where all Pr(aj„ € ^k) = ^unif Punif 
by a rejection sampling procedure, where each a;„ G 0,^ is further labeled as primary, or "accepted" , 
with probability (i^^unif J'unif)/Pr(a;n G ^k), and labeled as secondary, or "rejected", otherwise. 
The set of primary measurements alone would be adequate to call upon the general theory; passing 
to obvious linear algebra notations the corresponding constraints are denoted as \\Ax — b\\2 ^ e 
where A satisfies the S-RIP with high probability. Adding the secondary measurements cannot 
fundamentally hurt the recovery of ii minimization, because the constraints become 

\\Ax-b\\l + \\Bx-c\\l ^ {ef 

for some e' x e, hence in particular include \\Ax — 6||2 ^ This latter "tube" condition, together 
with the cone condition that ||a;||i should be minimized, suffice to obtain the recovery estimate on 
X as was shown in [12]. Adding a compatible constraint involving \\Bx — c\\2 only decreases the 
feasibility set, hence does not alter the recovery estimate. 

Second, it remains to show (45). An event such as a;„ G refers to the result of a sequential 
sampling of K objects among N, without replacement, and according to a certain distribution, ei- 
ther pn or Punif • This setup is different that the one considered in the classical papers on compressed 
sensing. It is shown below that 

Pr(w„ G ^^unif) = -f^unif Punif (uniform probabilities Punif) (46) 

in the uniform case, and 

Pr(u;„ G fix) ^ Kp (nonuniform probabilities Pn) (47) 



'^Interestingly, the statement that the accuracy of recovery is improved by adding measurements to the £i mini- 
mization problem, is in general false. It is only the error bound that does not degrade. 
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in the nonuniform case. In order for Pr(u;„ G ^k) to be greater than i^unif Punif as in equation 
(45), it suffices therefore that K ^ K-anif Punn/p, which proves the proposition. 

The following proof of (46) and (47) is due to Paul Rubin, who kindly allowed us to reproduce 
the argument here. 

In the case of uniform probabilities, write Punif = P for short. Denote by ftk the set of a;„ drawn 
during the first k iterations. By Bayes theorem applied to a sequence of nested (random) events, 
we have 

Pr(u;„ ^ Qfe) = Pr(wn ^ f^i) • Pr(u;„ ^ ri2|^n ^ ^^i) • • • Pr(w„ ^ fifc^n ^ ^k-i) 

-ii-p)-U-^]...(i 



{1-p) 



1-pJ "' \ l-{k-l)p 
1 — 2p\ f 1 — kp 



1-p J \l-{k-l)p 



The factors telescope in this product, with the result that Pr(a;„ ^ O^) = 1 — kp, hence Pr(a;„ G 
^k) = kp as desired. 

In the case of nonuniform probabilities Pn, consider the quantity 

qk= X] P"^- 

The qk are themselves random variables, since they depend on the history of sampling up to 
step k. Denote by Vt^ one realization of O^, and qk the corresponding realization of the random 
process qk- Then the reasoning essentially proceeds as previously, only by induction on k. First, 
Pr(a;n ^ Jli) = 1 — ^ 1 — p. Assume Pr(a;„ ^ ^k-i) ^ 1 — (^ — l)p- Then 

T-i / jol A cS \ 1 P'"' ^ ^fe— 1 Pn 
Pr(c<;„ ^ f = 1 - = — . 

J- - Qk-l 1 - Qk-l 

We claim that 

1 - qk-l l~{k-l)p ^ ' 

To justify this inequality, write the sequence of equivalences 

1 - qk-l -Pn ^ 1-kp 



1 - qk-l 1- {k- l)p 

{l-{k- l)p)(l - qk-i-Pn) ^ (1 - M(l - Qk-l) 
-Pn +{k- l)ppn ^ -P + qk-lP 

o^{pn-p)+ p{qk-i -{k- l)p)- 
The last inequality is obviously true. Therefore 

1 — kp 



l-(k-l)p' 



Because this bound is uniform over ^k-ii it is easy to see that the same inequality holds when the 
conditioning is on iOn ^ ^k-i instead of being on a; ^ 



^Indeed, if a random event B is the disjoint union \^Bi^ and Pr(A|_Bi) ^ C for all i, then Pr(^|_B) C as 
well. This fact is a simple application of Bayes's theorem: Pr(^|B) = Pr(^&B)/Pr(B) = Y^i^A-^^^i) I^A^) ^ 
C?EiPr(S0/Pr(B)=C7. 
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We conclude by writing 



1 — kp 



1- {k - l)p 
1 — kp. 



il-{k-l)p) 



Proof and discussion of equation (42) 

In this proof, a subscript t denotes a time differentiation. Consider the following quantity, 



u 



q dxdt, 



where u solves (40), and q solves (43) with final data (di(x) , d2{x)) not necessarily equal to 
{u{x,T),ut{x,T)). Because of (43), this integral is zero. Two integrations by parts in x and t 
reveal that 



= y" al{qtu)\ldx- j al{qut)\l dx 



+ 



dt'^ dx"^ 



u dxdt. 



The boundary terms in x have been put to zero from assuming, for instance, free-space propagation. 
In the first two terms, we can use n(a:, 0) = Uf(a;,0) = 0, and recognize that aQqt{x,T) = —di{x) 
and aQq{x,T) = d2{x). For the last term, we may use (40). The result is 



J di{x)u{x,T) dx + J d2{x)ut{x,T) dx = — J r{x) J q{x,t)—^^{x,t)dtdx. 
On the left, we recognize {(^J^ , F[al]r), where the inner product is in 

L2(R",R2)_ The right- 
hand-side is a linear functional of r(x), from L2(R",M) to M, hence we identify 



^(^'*)~^^(^'*)^*' 



as required. 

Notice that this formula for the adjoint operator is motivated by a standard optimization 
argument that we now reproduce. Consider the misfit functional J[a^,u] for the data (^1,^2), 
defined as 

J[a^,u] = ^J {u{x,T) - di{x)f + {ut{x,T) - d2ix)f dx, 

and under the constraint that u solves the original wave equation (39). Then the form of F* is 
inspired by the negative Frechet derivative — ^[ctq, nine]- In order to see this, we can define a dual 
variable q{x,t) corresponding to the constraint (39), dual variables Ho{x) and Hi{x) corresponding 
to the initial conditions for u, and write the Lagrangian 



W dx^ 



L[a^;u,q,no,fJ'i] = J[cr'^,u\- Jq 
The same integrations by parts as earlier give 

L[a^;u,q,fj,o,ni\= J[a^,u]- J 



udxdt + J iJ,Qu{x,0)dx + J fj,iut{x,0)dx. 



a 



qdxdt 
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+ J o^{qut)^dx — j a"^ {qtu)\^ dx + J iJ,QUQdx + J fxiuidx. 



The Prechet derivatives of L with respect to u{x,t), u{x,T), and ut{x,T) reveal the adjoint-state 
equations: 

6L 



5u{x, t) 
SL 



■ 2^ _ ^ 

dt'^ dx'^ 



q{x,t) = 0; 
u{x,T) - di{x) - a'^qt{x,T) = 0; 



6u{x, T) 

ut{x, T) - d2{x) + a^q{x, T) = 



Sut{x,T) 

The derivative of L with respect to cr^ gives the desired formula 



The derivatives with respect to q and /xQ) A^i replicate the constraints, and finally the derivatives 
with respect to it(x,0) and 0) are not interesting because they involve //q, Mij which do not 
play a role in the expression of bJ jba^ . 

Notice that the final conditions for q involve the predicted wavcficlds 7i and ut . If we put a = ctq 
however, and ctq is smooth, then u = itinc- In that case, essentially no reflections occur and the 
wavefields u\^c{x^ T), dUinc/dt{x, T) are zero at time T, in the region of interest where the data lies. 
It is only when making corrections to the guess cri that the predicted wavefields are important in 
the final condition for q. 

References 

[1] F. V. Atkinson. Wave propagation and the bremmer series. J. Math. Anal, and AppL, 1:255, 
1960. 

[2] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst. Templates for the Solution 
of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia, PA, 2000. 

[3] A. Bamberger, G. Chavent, and P. Lailly. Etude mathematique et numerique d'un probleme 
inverse pour I'equation des ondes a une dimension. Rapport IRIA-Laboria no. 226, 1977. 

[4] A. Bamberger, G. Chavent, and P. Lailly. About the stability of the inverse problem in 1- 
d wave equations: application to the interpretation of seismic profiles. Applied Math, and 
Optim., 5(l):l-47, 1979. 

[5] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, 
C. Rominc, and H. Van der Vorst. Templates for the Solution of Linear Systems: Building 
Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994. 

[6] G. Beylkin, R. Coifman, and V. Rokhlin. Fast wavelet transforms and numerical algorithms. 
Commun. on Pure and Appl. Math., 44:141-183, 1991. 

[7] T. Blumensath and M. E. Davies. Gradient pursuits. IEEE Trans. Signal Proc, 56(6):2370- 
2382, 2008. 



42 



[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York, 
New York, 2004. 

[9] K. Bredies and D. A. Lorenz. Iterative soft-thresholding converges hnearly. submitted, 2007. 

[10] E. Candes and L. Demanet. The curvelet representation of wave propagators is optimally 
sparse. Commun. on Pure and Appl. Math., 58(11):1472-1528, 2005. 

[11] E. Candes, L. Demanet, D. Donoho, and L. Ying. Fast discrete curvelet transforms. Multiscale 
Modeling & Simulation, 5(3):861-899, 2006. 

[12] E. Candes, J. Romberg, and T. Tao. Signal recovery from incomplete and inaccurate measure- 
ments. Commun. on Pure and Appl. Math., 59(8):1207?-1223, 2005. 

[13] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction 
from highly incomplete frequency information. IEEE Trans. Info. Theory, 52(2):489-509, 2006. 

[14] E. Candes and T. Tao. Near-optimal signal recovery from random projections: Universal 
encoding strategies? IEEE Trans. Info. Theory, 52(12) :5406-5425, 2006. 

[15] E. J. Candes, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted LI minimization. 
To appear in J. Fourier Anal. Appl., 2007. 

[16] C. Castro and E. Zuazua. Concentration and lack of observability of waves in highly hetero- 
geneous media. Arch. Rat. Mech. Anal, 164:39-72, 2002. 

[17] A. Chambolle. An algorithm for total variation minimization and applications. Journal of 
Mathematical Imaging and Vision, 20:89-97, 2004. 

[18] S. S. Chen, D.L. Donoho, and M.A. Saunders. Atomic decomposition by basis pursuit. SIAM 
Journal on Scientific Computing, 20(1):33-61, 1998. 

[19] A. Cohen, W. Dahmen, and R. DeVore. Adaptive wavelet methods for elliptic operator equa- 
tions: convergence rates. Math. Comp., 70(233) :27-75, 2000. 

[20] P. L. Combettes and V. R. Wajs. Gsignal recovery by proximal forward-backward splitting. 
SIAM Journal on Multiscale Modeling and Simulation, 4(4), 2005. 

[21] F. Conrad, J. Leblond, and J. P. Marmorat. Boundary control and stabilization of the one- 
dimensional wave equation. Lecture Notes in control and Inform. Sci., 178:142-162, 1992. 

[22] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse 
problems with a sparsity constraint. Commun. on Pure and Appl. Math., 57:1413-1541, 2004. 

[23] R. Dautray and J.L. Lions. Mathematical Analysis and Numerical Methods for Science and 
Technology. Springer- Verlag, 1990. 

[24] L. Demanet and L. Ying. Discrete symbol calculus, submitted, 2008. 

[25] L. Demanet and L. Ying. Wave atoms and time upscaling of wave equations. To appear in 
Numer. Math., 2008. 

[26] R. A. DeVore. Nonlinear approximation. Acta Numerica, 7:51-150, 1998. 



43 



[27] R. A. DcVorc and G. G. Lorentz. Constructive approximation, volume 303 of Grundlehren der 
math. Wissenschaften. Springer, 1993. 

[28] D. Donoho. Compressed sensing. IEEE Trans. Info. Theory, 52(4): 1289-1306, 2006. 

[29] D. L. Donoho and P. B. Stark. Uncertainty principles and signal recovery. SIAM J. of Appl. 

Math., 49(3):906-931, June 1989. 

[30] D. L. Donoho and Y. Tsaig. Fast solution of ^^-norm minimization problems when the solution 
may be sparse, to appear in Journal of Mathematical Imaging and Vision, 2006. 

[31] W. Yin E. T. Hale and Y. Zhang. A fixed-point continuation method for U-regularized mini- 
mization with applications to compressed sensing. CAAM Technical Report TR07-07, 2008. 

[32] B. Efron, T. Hastie, I. Johnstone, and T. Tibshirani. Least angle regression. Annals of 

Statistics, 32(2):407-499, 2004. 

[33] B. Engquist, S. Osher, and S. Zhong. Fast wavelet based algorithms for evolution equations. 
SIAM J. Sci. Comput, 15(4):755-775, 1994. 

[34] Y. Erlangga and R. Nabben. Multilevel projection-based nested krylov iteration for boundary 
value problems, 2008. 

[35] M. Figueiredo and R. Nowak. An EM Algorithm for Wavelet-Based Image Restoration. IEEE 
Trans. Image Proc, 12(8):906-916, 2003. 

[36] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse recon- 
struction: Application to compressed sensing and other inverse problems. IEEE Journal of 
Selected Topics in Signal Processing, l(4):586-598, 2007. 

[37] J-P. Fouque, J. Garnier, G. Papanicolaou, and K. Solna. Wave Propagation and Time Rever- 
sal in Randomly Layered Media, volume 56 of Stochastic Modelling and Applied Probability. 
Springer, 2007. 

[38] I. F. Gorodnitsky and B. D. Rao. Sparse signal reconstruction from limited data using FO- 
CUSS: a re-weighted minimum norm algorithm. IEEE Trans. Image Proc, 45(3):600-616, 
March 1997. 

[39] P. W. Jones, M. Maggioni, and R. Schul. Universal local parametrizations via heat kernels 
and eigenfunctions of the laplacian. Submitted, 2008. 

[40] R.M. Lewis and W.W. Symes. On the relation between the velocity coefficient and boundary 
value for solutions of the one-dimensional wave equation. Inverse Problems, 7(35):597-631, 
1991. 

[41] T. Lin and F. Herrmann. Compressed wavefield extrapolation. Geophysics, 72(5):77-93, 2007. 

[42] J. L. Lions and E. Magcnes. N on- Homogeneous Boundary Value Problems and Applications. 
Stochastic Modelling and Applied Probability. Springer- Verlag, 1972. 

[43] S. Mallat. A wavelet tour of signal processing, 3rd edition. Academic Press, San Diego, USA, 
2008. 

[44] Y. Meyer. Wavelets and Operators. Analysis at Urbana, London Math. Soc. Lecture Notes 
Series 137. Cambridge University Press, 1989. 



44 



[45] D. Needell and J. A. Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate 
samples, to appear in Appl. Comp. Harmonic Anal, 2008. 

[46] Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1, Ser. 
A):127-152, 2005. 

[47] H. Owhadi and L. Zhang. Numerical homogenization of the acoustic wave equations with a 
continuum of scales, 2008. 

[48] R. E. Plcssix. A review of the adjoint-state method for computing the gradient of a functional 
with geophysical applications. Geophysical Journal International, 167(9) :495-503, 2006. 

[49] M. Rudelson and R.Vershynin. On sparse reconstruction from fourier and gaussian measure- 
ments. Commun. on Pure and Appl. Math., 61(8):1025-1045, 2008. 

[50] F. Santosa and W. W. Symes. Linear inversion of band-limited reflection seismograms. SIAM 

Journal on Scientific and Statistical Computing, 7(4): 1307-1330, 1986. 

[51] H. F. Smith. A parametrix construction for wave equations with C^'^ coefficients. Ann. Inst. 

Fourier (Grenoble), 48:797-835, 1998. 

[52] H. F. Smith. Spectral cluster estimates for C^'^ metrics. Amer. J. Math., 128:1069-1103, 2006. 

[53] C. D. Sogge. Eigcnfunction and bochner-riesz estimates on manifolds with boundary. Math. 
Research Letters, 9:205-216, 2002. 

[54] J.-L. Starck, M. Elad, and D.L. Donoho. Redundant multiscale transforms and their applica- 
tion for morphological component analysis. Advances in Imaging and Electron Physics, 132, 
2004. 

[55] C. C. Stolk. PhD thesis, On the Modeling and Inversion of Seismic Data. PhD thesis, Utrecht 
University, 2000. 

[56] W. W. Symes. Transparency of one-dimensional acoustic media of bounded variation. TRIP 

technical report, 1985. 

[57] W. W. Symes. Reverse time migration with optimal checkpointing. Geophysics, 72(5):213-221, 
2007. 

[58] W. Yin, S. Osher, J. Darbon, and D. Goldfarb. Bregman iterative algorithms for compressed 
sensing and related problems. CAAM TR07-13, 2007. 

[59] M. Zhu and T. Chan. An efficient primal-dual hybrid gradient algorithm for total variation 
image restoration. UCLA CAM Report 08-34, 2007. 

[60] W. Zicmer. Weakly Differentiable Functions. Graduate texts in Mathematics. Springer- Verlag, 
1989. 

[61] E. Zuazua. Exact controllability for semilinear wave equations in one space dimension. Annales 
I.H.P. section C, 10(1):109-129, 1993. 

[62] E. Zuazua. Propagation, observation, and control of waves approximated by finite difference 
methods. SIAM Rev., 47(2):197-243, 2005. 



45 



